基于一条线将一个大文件拆分成较小的文件[关闭](Splitting a big file into smaller ones basing on a line [closed])
我有一个非常大的文件(超过20GB),我想把它分成更小的文件,比如2GB的多个文件。
有一点是我必须在特定行之前拆分:
我正在使用Python,但如果在shell中有另一个解决方案,我就是为了它。
这就是大文件的样子:
bigfile.txt (20GB)
Recno:: 0 some data... Recno:: 1 some data... Recno:: 2 some data... Recno:: 3 some data... Recno:: 4 some data... Recno:: 5 some data... Recno:: x some more data...这就是我要的:
file1.txt (2 GB +/-)
Recno::0 some data... Recno:: 1 some data...file2.txt (2GB +/-)
Recno:: 2 some data... Recno:: 4 some data... Recno:: 5 some data...等等等等...
谢谢 !
I have a pretty big file (more than 20GB) and I'd like to split it into smaller ones, like multiple files of 2GB.
One thing is I have to split before a specific line:
I'm using Python, but if there another solution in shell for example, I'm up for it.
This is how the big file looks like:
bigfile.txt (20GB)
Recno:: 0 some data... Recno:: 1 some data... Recno:: 2 some data... Recno:: 3 some data... Recno:: 4 some data... Recno:: 5 some data... Recno:: x some more data...This is what I want:
file1.txt (2 GB +/-)
Recno::0 some data... Recno:: 1 some data...file2.txt (2GB +/-)
Recno:: 2 some data... Recno:: 4 some data... Recno:: 5 some data...And so on, and so on...
Thanks !
最满意答案
你可以这样做:
import sys try: _, size, file = sys.argv size = int(size) except ValueError: sys.exit('Usage: splitter.py <size in bytes> <filename to split>') with open(file) as infile: count = 0 current_size = 0 # you could do something more # fancy with the name like use # os.path.splitext outfile = open(file+'_0', 'w+') for line in infile: if current_size > size and line.startswith('Recno'): outfile.close() count += 1 current_size = 0 outfile = open(file+'_{}'.format(count), 'w+') current_size += len(line) outfile.write(line) outfile.close()You could do something like this:
import sys try: _, size, file = sys.argv size = int(size) except ValueError: sys.exit('Usage: splitter.py <size in bytes> <filename to split>') with open(file) as infile: count = 0 current_size = 0 # you could do something more # fancy with the name like use # os.path.splitext outfile = open(file+'_0', 'w+') for line in infile: if current_size > size and line.startswith('Recno'): outfile.close() count += 1 current_size = 0 outfile = open(file+'_{}'.format(count), 'w+') current_size += len(line) outfile.write(line) outfile.close()更多推荐
发布评论