下面的代码表明,至少在函数clip()的特定情况下,熊猫可能比numpy慢得多.令人惊讶的是,在以numpy进行计算的同时,从熊猫到numpy再返回到熊猫的往返仍然比在熊猫中进行的速度要快得多.
熊猫功能是否应该以这种回旋方式实现?
In [49]: arr = np.random.randn(1000, 1000) In [50]: df=pd.DataFrame(arr) In [51]: %timeit np.clip(arr, 0, None) 100 loops, best of 3: 8.18 ms per loop In [52]: %timeit df.clip_lower(0) 1 loops, best of 3: 344 ms per loop In [53]: %timeit pd.DataFrame(np.clip(df.values, 0, None)) 100 loops, best of 3: 8.4 ms per loop解决方案
在master/0.13(很快发布)中,这要快得多(由于对alignment/dtype/nans的处理,它比本地numpy还要慢). /p>
每列应用0.12,因此这是一个相对昂贵的操作.
In [4]: arr = np.random.randn(1000, 1000) In [5]: df=pd.DataFrame(arr) In [6]: %timeit np.clip(arr, 0, None) 100 loops, best of 3: 6.62 ms per loop In [7]: %timeit df.clip_lower(0) 100 loops, best of 3: 12.9 ms per loopThe code below suggests that pandas may be much slower than numpy, at least in the specifi case of the function clip(). What is surprising is that making a roundtrip from pandas to numpy and back to pandas, while performing the calculations in numpy, is still much faster than doing it in pandas.
Shouldn't the pandas function have been implemented in this roundabout way?
In [49]: arr = np.random.randn(1000, 1000) In [50]: df=pd.DataFrame(arr) In [51]: %timeit np.clip(arr, 0, None) 100 loops, best of 3: 8.18 ms per loop In [52]: %timeit df.clip_lower(0) 1 loops, best of 3: 344 ms per loop In [53]: %timeit pd.DataFrame(np.clip(df.values, 0, None)) 100 loops, best of 3: 8.4 ms per loop解决方案
In master/0.13 (release very shortly), this is much faster (still slightly slower that native numpy because of handling of alignment/dtype/nans).
In 0.12 it was applying per column, so this was a relatively expensive operation.
In [4]: arr = np.random.randn(1000, 1000) In [5]: df=pd.DataFrame(arr) In [6]: %timeit np.clip(arr, 0, None) 100 loops, best of 3: 6.62 ms per loop In [7]: %timeit df.clip_lower(0) 100 loops, best of 3: 12.9 ms per loop
更多推荐
pandas 比numpy慢得多?
发布评论