对压缩存档中的文本文件运行"head"，而无需解压缩存档

编程入门行业动态更新时间:2024-10-24 20:14:53

本文介绍了对压缩存档中的文本文件运行"head"，而无需解压缩存档的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

问候，

我已经从先前的团队接手，并编写了处理csv文件的ETL作业.我在ubuntu上结合使用了shell脚本和perl. csv文件很大；他们以压缩档案的形式到达.解压缩后，许多都超过30Gb-是的，这是G

I've taken over from a prior team and writing ETL jobs which process csv files. I use a combination of shell scripts and perl on ubuntu. The csv files are huge; they arrive as zipped archives. Unzipped, many are more than 30Gb - yes, that's a G

旧版进程是在cron上运行的批处理作业，它完全解压缩每个文件，读取文件的第一行并将其复制到配置文件中，然后重新压缩整个文件.有时候这会花费许多小时的处理时间，毫无益处.

Legacy process is a batch job running on cron that unzips each file entirely, reads and copies the first line of it into a config file, then re-zips the entire file. Some days this takes many many hours of processing time, for no benefit.

您能建议一种方法，仅从压缩存档中的每个文件中提取第一行(或前几行)，而无需完全解压缩存档吗?

Can you suggest a method to only extract the first line (or first few lines) from each file inside a zipped archive, without fully unpacking the archives?

推荐答案

unzip

The unzip command line utility has a -p option which dumps a file to standard out. Just pipe that into head and it'll not bother extracting the whole file to disk.

或者，来自 perldoc IO::Compress::Zip :

Alternatively, from perldoc IO::Compress::Zip:

my ($status, $bufferRef); my $member = $zip->memberNamed( 'xyz.txt' ); $member->desiredCompressionMethod( COMPRESSION_STORED ); $status = $member->rewindData(); die "error $status" unless $status == AZ_OK; while ( ! $member->readIsDone() ) { ( $bufferRef, $status ) = $member->readChunk(); die "error $status" if $status != AZ_OK && $status != AZ_STREAM_END; # do something with $bufferRef: print $$bufferRef; } $member->endRead();

进行修改以适应需要，即遍历文件列表$zip->memberNames()，并且仅读取前几行.

Modify to suit, i.e. by iterating over the file list $zip->memberNames(), and only reading the first few lines.

更多推荐

对压缩存档中的文本文件运行"head",而无需解压缩存档

本文发布于:2023-11-16 20:53:11，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1607420.html