在我的一个类中,我有许多方法都从相同的字典中提取值.但是,如果其中一个方法试图访问一个不存在的值,它必须调用另一个方法来使该值与该键相关联.
In one of my classes I have a number of methods that all draw values from the same dictionaries. However, if one of the methods tries to access a value that isn't there, it has to call another method to make the value associated with that key.
我目前实现如下,其中 findCrackDepth(tonnage) 为 self.lowCrackDepth[tonnage] 分配一个值.
I currently have this implemented as follows, where findCrackDepth(tonnage) assigns a value to self.lowCrackDepth[tonnage].
if tonnage not in self.lowCrackDepth: self.findCrackDepth(tonnage) lcrack = self.lowCrackDepth[tonnage]但是,我也可以这样做
try: lcrack = self.lowCrackDepth[tonnage] except KeyError: self.findCrackDepth(tonnage) lcrack = self.lowCrackDepth[tonnage]我认为两者之间的性能差异与字典中值已经存在的频率有关.这个差别有多大?我正在生成几百万个这样的值(分布在类的许多实例中的许多字典中),并且每次该值不存在时,可能有两次它存在.
I assume there is a performance difference between the two related to how often the values is already in the dictionary. How big is this difference? I'm generating a few million such values (spread across a many dictionaries in many instances of the class), and for each time the value doesn't exist, there are probably two times where it does.
推荐答案这是一个微妙的问题,因为您需要小心避免持久的副作用",并且性能权衡取决于丢失键的百分比.因此,考虑如下 dil.py 文件:
It's a delicate problem to time this because you need care to avoid "lasting side effects" and the performance tradeoff depends on the % of missing keys. So, consider a dil.py file as follows:
def make(percentmissing): global d d = dict.fromkeys(range(100-percentmissing), 1) def addit(d, k): d[k] = k def with_in(): dc = d.copy() for k in range(100): if k not in dc: addit(dc, k) lc = dc[k] def with_ex(): dc = d.copy() for k in range(100): try: lc = dc[k] except KeyError: addit(dc, k) lc = dc[k] def with_ge(): dc = d.copy() for k in range(100): lc = dc.get(k) if lc is None: addit(dc, k) lc = dc[k]和一系列 timeit 调用,例如:
and a series of timeit calls such as:
$ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_in()' 10000 loops, best of 3: 28 usec per loop $ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ex()' 10000 loops, best of 3: 41.7 usec per loop $ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_ge()' 10000 loops, best of 3: 46.6 usec per loop这表明,如果缺少 10% 的键,in 检查实际上是最快的方法.
this shows that, with 10% missing keys, the in check is substantially the fastest way.
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_in()' 10000 loops, best of 3: 24.6 usec per loop $ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ex()' 10000 loops, best of 3: 23.4 usec per loop $ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_ge()' 10000 loops, best of 3: 42.7 usec per loop只有 1% 的键丢失,exception 方法略微最快(并且 get 方法在任何一种情况下都是最慢的).
with just 1% missing keys, the exception approach is marginally fastest (and the get approach remains the slowest one in either case).
因此,为了获得最佳性能,除非绝大多数(99%+)的查找会成功,否则in 方法更可取.
So, for optimal performance, unless the vast majority (99%+) of lookups is going to succeed, the in approach is preferable.
当然,还有另一种优雅的可能性:添加一个 dict 子类,如...:
Of course, there's another, elegant possibility: adding a dict subclass like...:
class dd(dict): def __init__(self, *a, **k): dict.__init__(self, *a, **k) def __missing__(self, k): addit(self, k) return self[k] def with_dd(): dc = dd(d) for k in range(100): lc = dc[k]然而……:
$ python -mtimeit -s'import dil; dil.make(1)' 'dil.with_dd()' 10000 loops, best of 3: 46.1 usec per loop $ python -mtimeit -s'import dil; dil.make(10)' 'dil.with_dd()' 10000 loops, best of 3: 55 usec per loop...虽然确实很漂亮,但这并不是性能赢家——即使使用 get 方法,或者更慢,只是使用看起来更漂亮的代码来使用它.(defaultdict,在语义上类似于这个 dd 类,如果它适用,将是一个性能上的胜利,但那是因为 __missing__ 特殊方法,在在这种情况下,是用经过优化的 C 代码实现的).
...while slick indeed, this is not a performance winner -- it's about even with the get approach, or slower, just with much nicer-looking code to use it. (defaultdict, semantically analogous to this dd class, would be a performance win if it was applicable, but that's because the __missing__ special method, in that case, is implemented in well optimized C code).
更多推荐
Python 性能:Try
发布评论