在numpy数组中快速替换(Fast replace in numpy array)

我一直在尝试实现一些修改来加速这个伪代码：

>>> A=np.array([1,1,1,2,2,2,3,3,3]) >>> B=np.array([np.power(A,n) for n in [3,4,5]]) >>> B array([[ 1, 1, 1, 8, 8, 8, 27, 27, 27], [ 1, 1, 1, 16, 16, 16, 81, 81, 81], [ 1, 1, 1, 32, 32, 32, 243, 243, 243]])

A的元素经常重复10-20次，B的形状需要保留，因为它后来乘以另一个相同形状的数组。

我的第一个想法是使用以下代码：

uA=np.unique(A) uB=np.array([np.power(uA,n) for n in [3,4,5]]) B=[] for num in range(uB.shape[0]): Temp=np.copy(A) for k,v in zip(uA,uB[num]): Temp[A==k] = v B.append(Temp) B=np.array(B) ### Also any better way to create the numpy array B?

这似乎相当糟糕，可能有更好的方法。任何关于如何提高速度的想法都将非常感激。

感谢您的时间。

这是一个更新。我意识到我的功能编码很差。感谢大家的建议。我会在将来更好地重新解释我的问题，以便他们展示所需的一切。

Normal=''' import numpy as np import scipy def func(value,n): if n==0: return 1 else: return np.power(value,n)/scipy.factorial(n,exact=0)+func(value,n-1) A=np.random.randint(10,size=250) A=np.unique(A) B=np.array([func(A,n) for n in [6,8,10]]) ''' Me=''' import numpy as np import scipy def func(value,n): if n==0: return 1 else: return np.power(value,n)/scipy.factorial(n,exact=0)+func(value,n-1) A=np.random.randint(10,size=250) uA=np.unique(A) uB=np.array([func(A,n) for n in [6,8,10]]) B=[] for num in range(uB.shape[0]): Temp=np.copy(A) for k,v in zip(uA,uB[num]): Temp[A==k] = v B.append(Temp) B=np.array(B) ''' Alex=''' import numpy as np import scipy A=np.random.randint(10,size=250) power=np.arange(11) fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1) power=np.power(A,np.arange(11).reshape(-1,1)) value=power/fact six=np.sum(value[:6],axis=0) eight=six+np.sum(value[6:8],axis=0) ten=eight+np.sum(value[8:],axis=0) B=np.vstack((six,eight,ten)) ''' Alex=''' import numpy as np import scipy A=np.random.randint(10,size=250) power=np.arange(11) fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1) power=np.power(A,np.arange(11).reshape(-1,1)) value=power/fact six=np.sum(value[:6],axis=0) eight=six+np.sum(value[6:8],axis=0) ten=eight+np.sum(value[8:],axis=0) B=np.vstack((six,eight,ten)) ''' Alex2=''' import numpy as np import scipy def find_count(the_list): count = list(the_list).count result = [count(item) for item in set(the_list)] return result A=np.random.randint(10,size=250) A_unique=np.unique(A) A_counts = np.array(find_count(A_unique)) fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1) power=np.power(A_unique,np.arange(11).reshape(-1,1)) value=power/fact six=np.sum(value[:6],axis=0) eight=six+np.sum(value[6:8],axis=0) ten=eight+np.sum(value[8:],axis=0) B_nodup=np.vstack((six,eight,ten)) B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A_unique.shape[0]) ] B = np.hstack( B_list ) ''' print timeit.timeit(Normal, number=10000) print timeit.timeit(Me, number=10000) print timeit.timeit(Alex, number=10000) print timeit.timeit(Alex2, number=10000) Normal: 10.7544178963 Me: 23.2039361 Alex: 4.85648703575 Alex2: 4.18024992943

I have been trying to implement some modification to speed up this pseudo code:

>>> A=np.array([1,1,1,2,2,2,3,3,3]) >>> B=np.array([np.power(A,n) for n in [3,4,5]]) >>> B array([[ 1, 1, 1, 8, 8, 8, 27, 27, 27], [ 1, 1, 1, 16, 16, 16, 81, 81, 81], [ 1, 1, 1, 32, 32, 32, 243, 243, 243]])

Where elements of A are often repeated 10-20 times and the shape of B needs to be retained because it is multiplied by another array of the same shape later.

My first idea was to use the following code:

uA=np.unique(A) uB=np.array([np.power(uA,n) for n in [3,4,5]]) B=[] for num in range(uB.shape[0]): Temp=np.copy(A) for k,v in zip(uA,uB[num]): Temp[A==k] = v B.append(Temp) B=np.array(B) ### Also any better way to create the numpy array B?

This seems fairly terrible and there is likely a better way. Any idea on how to speed this up would be much appreciated.

Thank you for your time.

Here is an update. I realized that my function was poorly coded. A thank you to everyone for the suggestions. I will try to rephrase my questions better in the future so that they show everything required.

Normal=''' import numpy as np import scipy def func(value,n): if n==0: return 1 else: return np.power(value,n)/scipy.factorial(n,exact=0)+func(value,n-1) A=np.random.randint(10,size=250) A=np.unique(A) B=np.array([func(A,n) for n in [6,8,10]]) ''' Me=''' import numpy as np import scipy def func(value,n): if n==0: return 1 else: return np.power(value,n)/scipy.factorial(n,exact=0)+func(value,n-1) A=np.random.randint(10,size=250) uA=np.unique(A) uB=np.array([func(A,n) for n in [6,8,10]]) B=[] for num in range(uB.shape[0]): Temp=np.copy(A) for k,v in zip(uA,uB[num]): Temp[A==k] = v B.append(Temp) B=np.array(B) ''' Alex=''' import numpy as np import scipy A=np.random.randint(10,size=250) power=np.arange(11) fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1) power=np.power(A,np.arange(11).reshape(-1,1)) value=power/fact six=np.sum(value[:6],axis=0) eight=six+np.sum(value[6:8],axis=0) ten=eight+np.sum(value[8:],axis=0) B=np.vstack((six,eight,ten)) ''' Alex=''' import numpy as np import scipy A=np.random.randint(10,size=250) power=np.arange(11) fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1) power=np.power(A,np.arange(11).reshape(-1,1)) value=power/fact six=np.sum(value[:6],axis=0) eight=six+np.sum(value[6:8],axis=0) ten=eight+np.sum(value[8:],axis=0) B=np.vstack((six,eight,ten)) ''' Alex2=''' import numpy as np import scipy def find_count(the_list): count = list(the_list).count result = [count(item) for item in set(the_list)] return result A=np.random.randint(10,size=250) A_unique=np.unique(A) A_counts = np.array(find_count(A_unique)) fact=scipy.factorial(np.arange(11),exact=0).reshape(-1,1) power=np.power(A_unique,np.arange(11).reshape(-1,1)) value=power/fact six=np.sum(value[:6],axis=0) eight=six+np.sum(value[6:8],axis=0) ten=eight+np.sum(value[8:],axis=0) B_nodup=np.vstack((six,eight,ten)) B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A_unique.shape[0]) ] B = np.hstack( B_list ) ''' print timeit.timeit(Normal, number=10000) print timeit.timeit(Me, number=10000) print timeit.timeit(Alex, number=10000) print timeit.timeit(Alex2, number=10000) Normal: 10.7544178963 Me: 23.2039361 Alex: 4.85648703575 Alex2: 4.18024992943

最满意答案

使用numpy.tile（）和numpy.hstack（）的组合，如下所示：

A = np.array([1,2,3]) A_counts = np.array([3,3,3]) A_powers = np.array([[3],[4],[5]]) B_nodup = np.power(A, A_powers) B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A.shape[0]) ] B = np.hstack( B_list )

转置和堆栈可以颠倒，这可能更快：

B_list = [ np.tile( B_nodup[:,i], (A_counts[i], 1) ) for i in range(A.shape[0]) ] B = np.transpose( np.vstack( B_list ) )

如果您计算的函数非常昂贵，或者它被复制很多次（超过10次），这可能只值得做; 做一个瓷砖和堆栈，以防止额外10次计算功率函数可能是不值得的。请基准测试并告诉我们。

编辑：或者，您可以使用广播来摆脱列表理解：

>>> A=np.array([1,1,1,2,2,2,3,3,3]) >>> B = np.power(A,[[3],[4],[5]]) >>> B array([[ 1, 1, 1, 8, 8, 8, 27, 27, 27], [ 1, 1, 1, 16, 16, 16, 81, 81, 81], [ 1, 1, 1, 32, 32, 32, 243, 243, 243]])

这可能相当快，但实际上并没有按照你的要求做。

Use a combination of numpy.tile() and numpy.hstack(), as follows:

A = np.array([1,2,3]) A_counts = np.array([3,3,3]) A_powers = np.array([[3],[4],[5]]) B_nodup = np.power(A, A_powers) B_list = [ np.transpose( np.tile( B_nodup[:,i], (A_counts[i], 1) ) ) for i in range(A.shape[0]) ] B = np.hstack( B_list )

The transpose and stack may be reversed, this may be faster:

B_list = [ np.tile( B_nodup[:,i], (A_counts[i], 1) ) for i in range(A.shape[0]) ] B = np.transpose( np.vstack( B_list ) )

This is likely only worth doing if the function you are calculating is quite expensive, or it is duplicated many, many times (more than 10); doing a tile and stack to prevent calculating the power function an extra 10 times is likely not worth it. Please benchmark and let us know.

EDIT: Or, you could just use broadcasting to get rid of the list comprehension:

>>> A=np.array([1,1,1,2,2,2,3,3,3]) >>> B = np.power(A,[[3],[4],[5]]) >>> B array([[ 1, 1, 1, 8, 8, 8, 27, 27, 27], [ 1, 1, 1, 16, 16, 16, 81, 81, 81], [ 1, 1, 1, 32, 32, 32, 243, 243, 243]])

This is probably pretty fast, but doesn't actually do what you asked.

更多推荐