python on

python on-the-fly md5作为一个读取流(python on-the-fly md5 as one reads a stream)

python 3是否有用于制作过滤流的结构？特别是，我的目标是计算从REST服务读取的内容的md5校验和，而不需要额外的副本。如果我可以继承某种类型的过滤器流并将字节推送到hashlib派生的md5对象中，我会很好。

目前，我的代码包括：

shutil.copyfileobj(r.raw, outstream)

其中'r'是响应对象。我可以在r.raw周围包装一个生成器或者一些这样的东西，它将被调用，每个数据缓冲区都被读取，这样我就可以将它传递给md5了吗？

Does python 3 have a structure for making a filtering stream? In particular, my goal here is to calculate an md5 checksum of the contents read from a REST service with requests without making an extra copy. If I could subclass some sort of filter stream and just shove the bytes into a hashlib-derived md5 object I'd be good.

Currently, my code includes:

shutil.copyfileobj(r.raw, outstream)

where 'r' is the response object. Can I wrap a generator or some such thing around r.raw that will be called with each buffer of data as read, so that I can then pass it into md5?

最满意答案

requests支持以块的形式读取URL数据，并且hashlib库允许您以块的形式计算MD5，因此您已经拥有了所需的一切。您可以在.iter_lines()或.iter_content()之间进行选择：

import requests import hashlib r = requests.get(url, stream=True) sig = hashlib.md5() for line in r.iter_lines(): sig.update(line) print(sig.hexdigest())

如果必须将其视为过滤器，请使用生成器：

class MD5TransparentFilter: def __init__(self, source): self._sig = hashlib.md5() self._source = source def __iter__(self): for line in self._source: self._sig.update(line) yield line def hexdigest(self): return self._sig.hexdigest()

然后在.iter_lines()或.iter_content()迭代器上使用它：

r = requests.get(url, stream=True) filtered = MD5TransparentFilter(r.iter_content(1000)) for line in filtered: # do something with the line print(filtered.hexdigest())

对于shutil.copyfileobj()您需要实现.read()接口而不是.__iter__() ，但原则是相同的：

class MD5TransparentFile: def __init__(self, source): self._sig = hashlib.md5() self._source = source def read(self, buffer): # we ignore the buffer size, just use the `.next()` value in the source iterator try: line = self._source.next() self._sig.update(line) return line except StopIteration: return b'' def hexdigest(self): return self._sig.hexdigest()

MD5TransparentFile()类使用.iter_content()或.iter_lines()迭代器，并在每次调用时将数据返回到.read() ，以及动态计算MD5。这可以直接用于shutil.copyfileobj()示例。

requests supports reading URL data in chunks, and the hashlib library lets you calculate a MD5 in chunks, so you have everything you need right there already. You can choose between .iter_lines() or .iter_content():

import requests import hashlib r = requests.get(url, stream=True) sig = hashlib.md5() for line in r.iter_lines(): sig.update(line) print(sig.hexdigest())

If you have to view it as a filter, use a generator:

class MD5TransparentFilter: def __init__(self, source): self._sig = hashlib.md5() self._source = source def __iter__(self): for line in self._source: self._sig.update(line) yield line def hexdigest(self): return self._sig.hexdigest()

then use that on your .iter_lines() or .iter_content() iterator:

r = requests.get(url, stream=True) filtered = MD5TransparentFilter(r.iter_content(1000)) for line in filtered: # do something with the line print(filtered.hexdigest())

For shutil.copyfileobj() you'd need to implement a .read() interface instead of .__iter__(), but the principles are the same:

class MD5TransparentFile: def __init__(self, source): self._sig = hashlib.md5() self._source = source def read(self, buffer): # we ignore the buffer size, just use the `.next()` value in the source iterator try: line = self._source.next() self._sig.update(line) return line except StopIteration: return b'' def hexdigest(self): return self._sig.hexdigest()

The MD5TransparentFile() class takes your .iter_content() or .iter_lines() iterator, and return data from that on each call to .read(), as well as calculate the MD5 on the fly. This can be used directly for your shutil.copyfileobj() example.

更多推荐

python on

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表