对压缩存档中的文本文件运行"head",而无需解压缩存档

编程入门 行业动态 更新时间:2024-10-24 20:14:53
本文介绍了对压缩存档中的文本文件运行"head",而无需解压缩存档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

问候,

我已经从先前的团队接手,并编写了处理csv文件的ETL作业.我在ubuntu上结合使用了shell脚本和perl. csv文件很大;他们以压缩档案的形式到达.解压缩后,许多都超过30Gb-是的,这是G

I've taken over from a prior team and writing ETL jobs which process csv files. I use a combination of shell scripts and perl on ubuntu. The csv files are huge; they arrive as zipped archives. Unzipped, many are more than 30Gb - yes, that's a G

旧版进程是在cron上运行的批处理作业,它完全解压缩每个文件,读取文件的第一行并将其复制到配置文件中,然后重新压缩整个文件.有时候这会花费许多小时的处理时间,毫无益处.

Legacy process is a batch job running on cron that unzips each file entirely, reads and copies the first line of it into a config file, then re-zips the entire file. Some days this takes many many hours of processing time, for no benefit.

您能建议一种方法,仅从压缩存档中的每个文件中提取第一行(或前几行),而无需完全解压缩存档吗?

Can you suggest a method to only extract the first line (or first few lines) from each file inside a zipped archive, without fully unpacking the archives?

推荐答案

unzip

The unzip command line utility has a -p option which dumps a file to standard out. Just pipe that into head and it'll not bother extracting the whole file to disk.

或者,来自 perldoc IO::Compress::Zip :

Alternatively, from perldoc IO::Compress::Zip:

my ($status, $bufferRef); my $member = $zip->memberNamed( 'xyz.txt' ); $member->desiredCompressionMethod( COMPRESSION_STORED ); $status = $member->rewindData(); die "error $status" unless $status == AZ_OK; while ( ! $member->readIsDone() ) { ( $bufferRef, $status ) = $member->readChunk(); die "error $status" if $status != AZ_OK && $status != AZ_STREAM_END; # do something with $bufferRef: print $$bufferRef; } $member->endRead();

进行修改以适应需要,即遍历文件列表$zip->memberNames(),并且仅读取前几行.

Modify to suit, i.e. by iterating over the file list $zip->memberNames(), and only reading the first few lines.

更多推荐

对压缩存档中的文本文件运行"head",而无需解压缩存档

本文发布于:2023-11-16 20:53:11,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1607420.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:解压缩   文本文件   quot   head

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!