在互联网上搜索和阅读有关此主题的答案时,我收到令人困惑的消息.任何人都可以分享他们的经验吗?我知道一个事实,那就是gzip压缩的csv不是,但是Parquet的文件内部结构是如此,以至于Parquet vs csv的情况完全不同?
I get confusing messages when searching and reading answers on the internet on this subject. Anyone can share their experience? I know for a fact that gzipped csv is not, but maybe file internal structures for Parquet are such that it is totally different case for Parquet vs csv?
推荐答案具有GZIP压缩的实木复合地板文件实际上是可拆分的.这是因为Parquet文件的内部布局.它们始终是可拆分的,与所使用的压缩算法无关.
Parquet files with GZIP compression are actually splittable. This is because of the internal layout of Parquet files. These are always splittable, independent of the used compression algorithm.
这个事实主要是由于Parquet文件的设计分为以下几部分:
This fact is mainly due to the design of Parquet files that divided in the following parts:
您可以在此处找到更详细的说明: github/apache/parquet-format#file-format
You can find a more detailed explanation here: github/apache/parquet-format#file-format
更多推荐
HDFS for Spark中gspipped Parquet文件是否可拆分?
发布评论