根据 gz 的规范,文件大小保存在 .gz 文件的最后 4 个字节中.
According to the specifiction of gz the filesize is saved in the last 4bytes of a .gz file.
我创建了 2 个文件
dd if=/dev/urandom of=500M bs=1024 count=500000 dd if=/dev/urandom of=5G bs=1024 count=5000000我压缩了它们
gzip 500M 5G我检查了最后 4 个字节
I checked the last 4 bytes doing
tail -c4 500M|od -I (returns 512000000 as expected) tail -c4 5G|od -I (returns 825032704 as not expected)看来,撞到了看不见的32位屏障,使得写入ISIZE的值完全是无稽之谈.这比他们使用一些错误位更烦人.
It seems that hitting the invisible 32bit barrier, makes the value written into the ISIZE completely nonsense. Which is more annoying, than if they had used some error bit instead.
有谁知道一种方法可以从 .gz 中获取未压缩的 .gz 文件大小而不提取它?
Does anyone know of a way to get the uncompressed .gz filesize from the .gz without extracting it?
谢谢
规范:www.gzip/zlib/rfc-gzip.html
如果有人尝试,您可以使用/dev/zero 而不是/dev/urandom
edit: if anyone to try it out, you could use /dev/zero instead of /dev/urandom
推荐答案没有.
获得压缩流的确切大小的唯一方法是实际去解压缩它(即使您将所有内容都写入/dev/null 并只计算字节数).
The only way to get the exact size of a compressed stream is to actually go and decompress it (even if you write everything to /dev/null and just count the bytes).
值得注意的是,ISIZE定义为
Its worth noting that ISIZE is defined as
ISIZE(输入尺寸)这包含原始(未压缩)输入的大小数据模 2^32.
ISIZE (Input SIZE) This contains the size of the original (uncompressed) input data modulo 2^32.
在 gzip RFC 中,所以它实际上并不破坏 在 32 位屏障上,您看到的是预期行为.
in the gzip RFC so it isn't actually breaking at the 32-bit barrier, what you're seeing is expected behavior.
更多推荐
在 64 位平台上获取非常大的 .gz 文件的文件大小
发布评论