pandas 比numpy慢得多?

编程入门行业动态更新时间:2024-10-28 18:32:26

本文介绍了 pandas 比numpy慢得多?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

下面的代码表明，至少在函数clip()的特定情况下，熊猫可能比numpy慢得多.令人惊讶的是，在以numpy进行计算的同时，从熊猫到numpy再返回到熊猫的往返仍然比在熊猫中进行的速度要快得多.

熊猫功能是否应该以这种回旋方式实现?

In [49]: arr = np.random.randn(1000, 1000) In [50]: df=pd.DataFrame(arr) In [51]: %timeit np.clip(arr, 0, None) 100 loops, best of 3: 8.18 ms per loop In [52]: %timeit df.clip_lower(0) 1 loops, best of 3: 344 ms per loop In [53]: %timeit pd.DataFrame(np.clip(df.values, 0, None)) 100 loops, best of 3: 8.4 ms per loop

解决方案

在master/0.13(很快发布)中，这要快得多(由于对alignment/dtype/nans的处理，它比本地numpy还要慢). /p>

每列应用0.12，因此这是一个相对昂贵的操作.

In [4]: arr = np.random.randn(1000, 1000) In [5]: df=pd.DataFrame(arr) In [6]: %timeit np.clip(arr, 0, None) 100 loops, best of 3: 6.62 ms per loop In [7]: %timeit df.clip_lower(0) 100 loops, best of 3: 12.9 ms per loop

The code below suggests that pandas may be much slower than numpy, at least in the specifi case of the function clip(). What is surprising is that making a roundtrip from pandas to numpy and back to pandas, while performing the calculations in numpy, is still much faster than doing it in pandas.

Shouldn't the pandas function have been implemented in this roundabout way?

解决方案

In master/0.13 (release very shortly), this is much faster (still slightly slower that native numpy because of handling of alignment/dtype/nans).

In 0.12 it was applying per column, so this was a relatively expensive operation.