python 3是否有用于制作过滤流的结构? 特别是,我的目标是计算从REST服务读取的内容的md5校验和,而不需要额外的副本。 如果我可以继承某种类型的过滤器流并将字节推送到hashlib派生的md5对象中,我会很好。
目前,我的代码包括:
shutil.copyfileobj(r.raw, outstream)其中'r'是响应对象。 我可以在r.raw周围包装一个生成器或者一些这样的东西,它将被调用,每个数据缓冲区都被读取,这样我就可以将它传递给md5了吗?
Does python 3 have a structure for making a filtering stream? In particular, my goal here is to calculate an md5 checksum of the contents read from a REST service with requests without making an extra copy. If I could subclass some sort of filter stream and just shove the bytes into a hashlib-derived md5 object I'd be good.
Currently, my code includes:
shutil.copyfileobj(r.raw, outstream)where 'r' is the response object. Can I wrap a generator or some such thing around r.raw that will be called with each buffer of data as read, so that I can then pass it into md5?
最满意答案
requests支持以块的形式读取URL数据 ,并且hashlib库允许您以块的形式计算MD5,因此您已经拥有了所需的一切。 您可以在.iter_lines()或.iter_content()之间进行选择:
import requests import hashlib r = requests.get(url, stream=True) sig = hashlib.md5() for line in r.iter_lines(): sig.update(line) print(sig.hexdigest())如果必须将其视为过滤器,请使用生成器:
class MD5TransparentFilter: def __init__(self, source): self._sig = hashlib.md5() self._source = source def __iter__(self): for line in self._source: self._sig.update(line) yield line def hexdigest(self): return self._sig.hexdigest()然后在.iter_lines()或.iter_content()迭代器上使用它:
r = requests.get(url, stream=True) filtered = MD5TransparentFilter(r.iter_content(1000)) for line in filtered: # do something with the line print(filtered.hexdigest())对于shutil.copyfileobj()您需要实现.read()接口而不是.__iter__() ,但原则是相同的:
class MD5TransparentFile: def __init__(self, source): self._sig = hashlib.md5() self._source = source def read(self, buffer): # we ignore the buffer size, just use the `.next()` value in the source iterator try: line = self._source.next() self._sig.update(line) return line except StopIteration: return b'' def hexdigest(self): return self._sig.hexdigest()MD5TransparentFile()类使用.iter_content()或.iter_lines()迭代器,并在每次调用时将数据返回到.read() ,以及动态计算MD5。 这可以直接用于shutil.copyfileobj()示例。
requests supports reading URL data in chunks, and the hashlib library lets you calculate a MD5 in chunks, so you have everything you need right there already. You can choose between .iter_lines() or .iter_content():
import requests import hashlib r = requests.get(url, stream=True) sig = hashlib.md5() for line in r.iter_lines(): sig.update(line) print(sig.hexdigest())If you have to view it as a filter, use a generator:
class MD5TransparentFilter: def __init__(self, source): self._sig = hashlib.md5() self._source = source def __iter__(self): for line in self._source: self._sig.update(line) yield line def hexdigest(self): return self._sig.hexdigest()then use that on your .iter_lines() or .iter_content() iterator:
r = requests.get(url, stream=True) filtered = MD5TransparentFilter(r.iter_content(1000)) for line in filtered: # do something with the line print(filtered.hexdigest())For shutil.copyfileobj() you'd need to implement a .read() interface instead of .__iter__(), but the principles are the same:
class MD5TransparentFile: def __init__(self, source): self._sig = hashlib.md5() self._source = source def read(self, buffer): # we ignore the buffer size, just use the `.next()` value in the source iterator try: line = self._source.next() self._sig.update(line) return line except StopIteration: return b'' def hexdigest(self): return self._sig.hexdigest()The MD5TransparentFile() class takes your .iter_content() or .iter_lines() iterator, and return data from that on each call to .read(), as well as calculate the MD5 on the fly. This can be used directly for your shutil.copyfileobj() example.
更多推荐
发布评论