如何为Popen的stdin连接多个文件(how to concatenate multiple files for stdin of Popen)

我将一个bash脚本移植到python 2.6，并且想要替换一些代码：

cat $( ls -tr xyz_`date +%F`_*.log ) | filter args > bzip2

我想我想要类似于http://docs.python.org/release/2.6/library/subprocess.html中的“替换外壳管线”示例，ala ...

p1 = Popen(["filter", "args"], stdin=*?WHAT?*, stdout=PIPE) p2 = Popen(["bzip2"], stdin=p1.stdout, stdout=PIPE) output = p2.communicate()[0]

但是，我不确定如何最好地提供p1的stdin值，以便连接输入文件。似乎我可以添加...

p0 = Popen(["cat", "file1", "file2"...], stdout=PIPE) p1 = ... stdin=p0.stdout ...

...但似乎越过（慢，低效率）管道的使用来调用具有重要功能的外部程序。（任何体面的外壳都会在内部执行这个cat 。）

所以，我可以想象一个满足文件对象API要求的自定义类，因此可以用于p1的stdin，连接任意其他文件对象。（ 编辑：现有的答案解释了为什么这是不可能的 ）

python 2.6是否有解决这个需求的机制？或者可能另一个Popen在python界被认为是完美的吗？

谢谢。

I'm porting a bash script to python 2.6, and want to replace some code:

cat $( ls -tr xyz_`date +%F`_*.log ) | filter args > bzip2

I guess I want something similar to the "Replacing shell pipe line" example at http://docs.python.org/release/2.6/library/subprocess.html, ala...

p1 = Popen(["filter", "args"], stdin=*?WHAT?*, stdout=PIPE) p2 = Popen(["bzip2"], stdin=p1.stdout, stdout=PIPE) output = p2.communicate()[0]

But, I'm not sure how best to provide p1's stdin value so it concatenates the input files. Seems I could add...

p0 = Popen(["cat", "file1", "file2"...], stdout=PIPE) p1 = ... stdin=p0.stdout ...

...but that seems to be crossing beyond use of (slow, inefficient) pipes to call external programs with significant functionality. (Any decent shell performs the cat internally.)

So, I can imagine a custom class that satisfies the file object API requirements and can therefore be used for p1's stdin, concatenating arbitrary other file objects. (EDIT: existing answers explain why this isn't possible)

Does python 2.6 have a mechanism addressing this need/want, or might another Popen to cat be considered perfectly fine in python circles?

Thanks.

最满意答案

你可以用Python代码替换你正在做的所有事情，除了你的外部工具。这样，只要您的外部util是便携式的，您的程序就会保持便携。您还可以考虑将C ++程序转换为库并使用Cython与其进行交互。正如Messa所示， date被替换为time.strftime ，globbing通过glob.glob完成， cat可以替换为读取列表中的所有文件并将它们写入到程序的输入中。对bzip2的调用可以用bz2模块替换，但这会使你的程序复杂化，因为你必须同时读写。要做到这一点，您需要使用p.communicate或线程（如果数据很大）（ select.select会是更好的选择，但它不能在Windows上运行）。

import sys import bz2 import glob import time import threading import subprocess output_filename = '../whatever.bz2' input_filenames = glob.glob(time.strftime("xyz_%F_*.log")) p = subprocess.Popen(['filter', 'args'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) output = open(output_filename, 'wb') output_compressor = bz2.BZ2Compressor() def data_reader(): for filename in input_filenames: f = open(filename, 'rb') p.stdin.writelines(iter(lambda: f.read(8192), '')) p.stdin.close() input_thread = threading.Thread(target=data_reader) input_thread.start() with output: for chunk in iter(lambda: p.stdout.read(8192), ''): output.write(output_compressor.compress(chunk)) output.write(output_compressor.flush()) input_thread.join() p.wait()

增加：如何检测文件输入类型

您可以使用文件扩展名或libmagic的Python绑定来检测文件的压缩方式。这里有一个代码示例可以执行这两个操作，并在可用时自动选择magic 。您可以选择适合您需求的部件，并根据您的需求进行调整。 open_autodecompress应检测MIME编码，并在适当的解压缩器可用时打开该文件。

import os import gzip import bz2 try: import magic except ImportError: has_magic = False else: has_magic = True mime_openers = { 'application/x-bzip2': bz2.BZ2File, 'application/x-gzip': gzip.GzipFile, } ext_openers = { '.bz2': bz2.BZ2File, '.gz': gzip.GzipFile, } def open_autodecompress(filename, mode='r'): if has_magic: ms = magic.open(magic.MAGIC_MIME_TYPE) ms.load() mimetype = ms.file(filename) opener = mime_openers.get(mimetype, open) else: basepart, ext = os.path.splitext(filename) opener = ext_openers.get(ext, open) return opener(filename, mode)

You can replace everything that you're doing with Python code, except for your external utility. That way your program will remain portable as long as your external util is portable. You can also consider turning the C++ program into a library and using Cython to interface with it. As Messa showed, date is replaced with time.strftime, globbing is done with glob.glob and cat can be replaced with reading all the files in the list and writing them to the input of your program. The call to bzip2 can be replaced with the bz2 module, but that will complicate your program because you'd have to read and write simultaneously. To do that, you need to either use p.communicate or a thread if the data is huge (select.select would be a better choice but it won't work on Windows).

import sys import bz2 import glob import time import threading import subprocess output_filename = '../whatever.bz2' input_filenames = glob.glob(time.strftime("xyz_%F_*.log")) p = subprocess.Popen(['filter', 'args'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) output = open(output_filename, 'wb') output_compressor = bz2.BZ2Compressor() def data_reader(): for filename in input_filenames: f = open(filename, 'rb') p.stdin.writelines(iter(lambda: f.read(8192), '')) p.stdin.close() input_thread = threading.Thread(target=data_reader) input_thread.start() with output: for chunk in iter(lambda: p.stdout.read(8192), ''): output.write(output_compressor.compress(chunk)) output.write(output_compressor.flush()) input_thread.join() p.wait()

Addition: How to detect file input type

You can use either the file extension or the Python bindings for libmagic to detect how the file is compressed. Here's a code example that does both, and automatically chooses magic if it is available. You can take the part that suits your needs and adapt it to your needs. The open_autodecompress should detect the mime encoding and open the file with the appropriate decompressor if it is available.

import os import gzip import bz2 try: import magic except ImportError: has_magic = False else: has_magic = True mime_openers = { 'application/x-bzip2': bz2.BZ2File, 'application/x-gzip': gzip.GzipFile, } ext_openers = { '.bz2': bz2.BZ2File, '.gz': gzip.GzipFile, } def open_autodecompress(filename, mode='r'): if has_magic: ms = magic.open(magic.MAGIC_MIME_TYPE) ms.load() mimetype = ms.file(filename) opener = mime_openers.get(mimetype, open) else: basepart, ext = os.path.splitext(filename) opener = ext_openers.get(ext, open) return opener(filename, mode)

更多推荐