python on

编程入门 行业动态 更新时间:2024-10-28 08:21:44
python on-the-fly md5作为一个读取流(python on-the-fly md5 as one reads a stream)

python 3是否有用于制作过滤流的结构? 特别是,我的目标是计算从REST服务读取的内容的md5校验和,而不需要额外的副本。 如果我可以继承某种类型的过滤器流并将字节推送到hashlib派生的md5对象中,我会很好。

目前,我的代码包括:

shutil.copyfileobj(r.raw, outstream)

其中'r'是响应对象。 我可以在r.raw周围包装一个生成器或者一些这样的东西,它将被调用,每个数据缓冲区都被读取,这样我就可以将它传递给md5了吗?

Does python 3 have a structure for making a filtering stream? In particular, my goal here is to calculate an md5 checksum of the contents read from a REST service with requests without making an extra copy. If I could subclass some sort of filter stream and just shove the bytes into a hashlib-derived md5 object I'd be good.

Currently, my code includes:

shutil.copyfileobj(r.raw, outstream)

where 'r' is the response object. Can I wrap a generator or some such thing around r.raw that will be called with each buffer of data as read, so that I can then pass it into md5?

最满意答案

requests支持以块的形式读取URL数据 ,并且hashlib库允许您以块的形式计算MD5,因此您已经拥有了所需的一切。 您可以在.iter_lines()或.iter_content()之间进行选择:

import requests import hashlib r = requests.get(url, stream=True) sig = hashlib.md5() for line in r.iter_lines(): sig.update(line) print(sig.hexdigest())

如果必须将其视为过滤器,请使用生成器:

class MD5TransparentFilter: def __init__(self, source): self._sig = hashlib.md5() self._source = source def __iter__(self): for line in self._source: self._sig.update(line) yield line def hexdigest(self): return self._sig.hexdigest()

然后在.iter_lines()或.iter_content()迭代器上使用它:

r = requests.get(url, stream=True) filtered = MD5TransparentFilter(r.iter_content(1000)) for line in filtered: # do something with the line print(filtered.hexdigest())

对于shutil.copyfileobj()您需要实现.read()接口而不是.__iter__() ,但原则是相同的:

class MD5TransparentFile: def __init__(self, source): self._sig = hashlib.md5() self._source = source def read(self, buffer): # we ignore the buffer size, just use the `.next()` value in the source iterator try: line = self._source.next() self._sig.update(line) return line except StopIteration: return b'' def hexdigest(self): return self._sig.hexdigest()

MD5TransparentFile()类使用.iter_content()或.iter_lines()迭代器,并在每次调用时将数据返回到.read() ,以及动态计算MD5。 这可以直接用于shutil.copyfileobj()示例。

requests supports reading URL data in chunks, and the hashlib library lets you calculate a MD5 in chunks, so you have everything you need right there already. You can choose between .iter_lines() or .iter_content():

import requests import hashlib r = requests.get(url, stream=True) sig = hashlib.md5() for line in r.iter_lines(): sig.update(line) print(sig.hexdigest())

If you have to view it as a filter, use a generator:

class MD5TransparentFilter: def __init__(self, source): self._sig = hashlib.md5() self._source = source def __iter__(self): for line in self._source: self._sig.update(line) yield line def hexdigest(self): return self._sig.hexdigest()

then use that on your .iter_lines() or .iter_content() iterator:

r = requests.get(url, stream=True) filtered = MD5TransparentFilter(r.iter_content(1000)) for line in filtered: # do something with the line print(filtered.hexdigest())

For shutil.copyfileobj() you'd need to implement a .read() interface instead of .__iter__(), but the principles are the same:

class MD5TransparentFile: def __init__(self, source): self._sig = hashlib.md5() self._source = source def read(self, buffer): # we ignore the buffer size, just use the `.next()` value in the source iterator try: line = self._source.next() self._sig.update(line) return line except StopIteration: return b'' def hexdigest(self): return self._sig.hexdigest()

The MD5TransparentFile() class takes your .iter_content() or .iter_lines() iterator, and return data from that on each call to .read(), as well as calculate the MD5 on the fly. This can be used directly for your shutil.copyfileobj() example.

更多推荐

本文发布于:2023-07-27 14:05:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1291494.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:python

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!