我遇到了一些性能问题,因此我想加快那些运行缓慢的脚本.但是我对如何加快速度没有更多的想法.因为我发现我经常被索引挡住了.我发现抽象思维对我来说非常困难.
I've encountered some performance problems thus I want to speed up those running-slow scripts. But I have no more ideas on how to speed up them. Because I found I was often blocked with the indices. I found the abstract thinking is very difficult for me.
脚本是
tic, n = 1000; d = 500; X = rand(n, d); R = rand(n, n); F = zeros(d, d); for i=1:n for j=1:n F = F + R(i,j)* ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:))); end end toc推荐答案
讨论&解决方案代码
这里很少建议使用bsxfun的方法.另外,请继续阅读以了解如何在这样的问题上实现 30x+ 加速!
Discussion & Solution Codes
Few approaches with bsxfun could be suggested here. Also, read on to see how one can get 30x+ speedup on a problem like this!
方法1(天真向量化方法)
为适应X行之间相减的两个操作,然后适应它们之间的后续逐元素乘法,基于朴素的bsxfun的方法将导致一个与((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)))相对应的4D中间数组.之后,需要乘以R以获得最终输出F.如下所示实现-
To accommodate the two operations of subtractions between rows of X and then the subsequent element-wise multiplications between them, a naive bsxfun based approach would lead to a 4D intermediate array which would correspond to ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:))). After that, one needs to multiply R to have the final output F. This is implemented as shown next -
v1 = bsxfun(@minus,X,permute(X,[3 2 1])); v2 = bsxfun(@times,permute(v1,[1 3 2]),permute(v1,[1 3 4 2])); F = reshape(R(:).'*reshape(v2,[],d^2),d,[]);方法2(不是天真的矢量化方法)
前面提到的方法进入4D,这可能会减慢速度.因此,您可以通过重塑将中间数据保留到3D.接下来列出-
The earlier mentioned approach goes into 4D which could slow down things. So, instead you can keep the intermediate data until 3D by reshaping. This is listed next -
sub1 = bsxfun(@minus,X,permute(X,[3 2 1])); sub1_2d = reshape(permute(sub1,[1 3 2]),n^2,[]) mult1 = bsxfun(@times,sub1_2d,permute(sub1_2d,[1 3 2])) F = reshape(R(:).'*reshape(mult1,[],d^2),d,[])方法3(混合方法)
现在,您可以基于方法2 (vectorized subtractions + loopy multiplications)进行混合处理.这种方法的好处是它使用fast matrix multiplication进行乘法运算,并将复杂度从早期的O(n ^ 2)降低到O(n),这应该使其效率更高.感谢@ Dev-iL,提出了这个想法!这是代码-
Now, you can make a hybrid approach based on Approach #2 (vectorized subtractions + loopy multiplications). Benefit of this approach would be that it uses the fast matrix multiplication to perform the multiplications and reduces the complexity to O(n) from the earlier O(n^2) and this should make it much more efficient. Thanks to @Dev-iL, for suggesting this idea! Here's the code -
sub1 = bsxfun(@minus,X,permute(X,[3 2 1])); sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2])); F = zeros(d); for k = 1:size(sub1,3) blk = sub1(:,:,k); F = F + blk.'*blk; end基准化
基准化代码,将原始方法与方法#3
Benchmarking
Benchmarking code comparing the original approach against Approach #3
%// Parameters n = 500; d = 250; X = rand(n, d); R = rand(n, n); %// Warm up tic/toc. for k = 1:100000 tic(); elapsed = toc(); end disp('------------------------------ With Original Approach') tic F1 = zeros(d, d); for i=1:n for j=1:n F1 = F1 + R(i,j)*((X(i,:)-X(j,:))' * (X(i,:)-X(j,:))); end end toc, clear F1 i j disp('------------------------------ With Proposed Approach #3') tic sub1 = bsxfun(@minus,X,permute(X,[3 2 1])); sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2])); F = zeros(d); for k = 1:size(sub1,3) blk = sub1(:,:,k); F = F + blk.'*blk; end toc运行时结果
------------------------------ With Original Approach Elapsed time is 29.728571 seconds. ------------------------------ With Proposed Approach #3 Elapsed time is 0.839726 seconds.那么,谁准备好进行 30x + 的提速!?
So, who's ready for a 30x+ speedup!?
更多推荐
是否可以加快此MATLAB脚本的速度?
发布评论