hadoop - How do s3n/s3a manage files? -
i've been using services kafka connect , secor persist parquet files s3. i'm not familiar hdfs or hadoop seems these services typically write temporary files either local memory or disk before writing in bulk s3. s3n/s3a file systems virtualize hdfs-style file system locally , push @ configured intervals or there one-to-one correspondence between write s3n/s3a , write s3?
i'm not entirely sure if i'm asking right question here. guidance appreciated.
s3a/s3n implement hadoop filesystem apis against remote object store, including pretending has directories can rename , delete.
they have historically saved data write local disk until close()
output stream, @ point upload takes place (which can slow). means must have temporary space biggest object plan create.
hadoop 2.8 has fast upload stream uploads file in 5+mb blocks gets written, in final close()
makes visible in object store. measurably faster when generating lots of data in single stream. avoids needing disk space.
Comments
Post a Comment