在编写一些数值分析代码时,我对需要许多Numpy调用的函数进行了瓶颈处理。 我不完全确定如何进一步进行性能优化。
问题:
该函数通过计算以下内容来确定错误,
码:
def foo(B_Mat, A_Mat): Temp = np.absolute(B_Mat) Temp /= np.amax(Temp) return np.sqrt(np.sum(np.absolute(A_Mat - Temp*Temp))) / B_Mat.shape[0]什么是从代码中挤出一些额外性能的最佳方法? 我最好的行动方案是使用Cython在单个for循环中执行大部分操作来减少临时阵列吗?
In writing some numerical analysis code, I have bottle-necked at a function that requires many Numpy calls. I am not entirely sure how to approach further performance optimization.
Problem:
The function determines error by calculating the following,
Code:
def foo(B_Mat, A_Mat): Temp = np.absolute(B_Mat) Temp /= np.amax(Temp) return np.sqrt(np.sum(np.absolute(A_Mat - Temp*Temp))) / B_Mat.shape[0]What would be the best way to squeeze some extra performance out of the code? Would my best course of action be performing the majority of the operations in a single for loop with Cython to cut down on the temporary arrays?
最满意答案
从实现中可以卸载到numexpr模块的特定函数,已知这对于算术计算非常有效。 对于我们的情况,具体来说我们可以用它来执行平方,求和和绝对计算。 因此,基于numexpr的解决方案将取代原始代码中的最后一步,就像这样 -
import numexpr as ne out = np.sqrt(ne.evaluate('sum(abs(A_Mat - Temp**2))'))/B_Mat.shape[0]通过将规范化步骤嵌入到numexpr的evaluate表达式中,可以实现进一步的性能提升。 因此,修改为使用numexpr的整个函数将是 -
def numexpr_app1(B_Mat, A_Mat): Temp = np.absolute(B_Mat) M = np.amax(Temp) return np.sqrt(ne.evaluate('sum(abs(A_Mat*M**2-Temp**2))'))/(M*B_Mat.shape[0])运行时测试 -
In [198]: # Random arrays ...: A_Mat = np.random.randn(4000,5000) ...: B_Mat = np.random.randn(4000,5000) ...: In [199]: np.allclose(foo(B_Mat, A_Mat),numexpr_app1(B_Mat, A_Mat)) Out[199]: True In [200]: %timeit foo(B_Mat, A_Mat) 1 loops, best of 3: 891 ms per loop In [201]: %timeit numexpr_app1(B_Mat, A_Mat) 1 loops, best of 3: 400 ms per loopThere are specific functions from the implementation that could be off-loaded to numexpr module which is known to be very efficient for arithmetic computations. For our case, specifically we could perform squaring, summation and absolute computations with it. Thus, a numexpr based solution to replace the last step in the original code, would be like so -
import numexpr as ne out = np.sqrt(ne.evaluate('sum(abs(A_Mat - Temp**2))'))/B_Mat.shape[0]A further performance boost could be achieved by embedding the normalization step into the numexpr's evaluate expression. Thus, the entire function modified to use numexpr would be -
def numexpr_app1(B_Mat, A_Mat): Temp = np.absolute(B_Mat) M = np.amax(Temp) return np.sqrt(ne.evaluate('sum(abs(A_Mat*M**2-Temp**2))'))/(M*B_Mat.shape[0])Runtime test -
In [198]: # Random arrays ...: A_Mat = np.random.randn(4000,5000) ...: B_Mat = np.random.randn(4000,5000) ...: In [199]: np.allclose(foo(B_Mat, A_Mat),numexpr_app1(B_Mat, A_Mat)) Out[199]: True In [200]: %timeit foo(B_Mat, A_Mat) 1 loops, best of 3: 891 ms per loop In [201]: %timeit numexpr_app1(B_Mat, A_Mat) 1 loops, best of 3: 400 ms per loop更多推荐
发布评论