在Matlab中加速inv（X'* X）* Q * inv（X'* X）的计算？(Accelerate the calculation of inv(X'X)Q*in

编程入门行业动态更新时间:2024-10-28 02:31:50

在Matlab中加速inv（X'* X）* Q * inv（X'* X）的计算？(Accelerate the calculation of inv(X'*X)*Q*inv(X'*X) in Matlab?)

我必须计算大型多元回归模型的Newey-West标准误差。

这个计算的最后一步是获得

nwse = sqrt(diag(N.*inv(X'*X)*Q*inv(X'*X)));

此文件交换贡献实现了这一点

nwse = sqrt(diag(N.*((X'*X)\Q/(X'*X))));

这看起来很合理，但在我的情况下（5000x5000稀疏Q和X'* X）它对我的需求来说太慢了（大约30秒，我必须为大约一百万个不同型号重复这个）。任何想法如何使这条线更快？

请注意，我只需要对角线，而不是整个矩阵，Q和（X'* X）都是正定的。

I have to calculate Newey-West standard errors for large multiple regression models.

The final step of this calculation is to obtain

nwse = sqrt(diag(N.*inv(X'*X)*Q*inv(X'*X)));

This file exchange contribution implements this as

nwse = sqrt(diag(N.*((X'*X)\Q/(X'*X))));

This looks reasonable, but in my case (5000x5000 sparse Q and X'*X) it's far too slow for my needs (about 30secs, I have to repeat this for about one million different models). Any ideas how to make this line faster?

Please note that I need only the diagonal, not the entire matrix and that both Q and (X'*X) are positive-definite.

最满意答案

我相信你可以通过明确地进行LU分解来节省大量的计算时间， [l, u, p, q] = lu(X'*X); 并在进行计算时使用这些因素。此外，由于X对于大约100个模型是恒定的，因此预先计算X'*X很可能会节省一些时间。

请注意，在您的情况下，最需要时间操作的可能就是sqrt函数。

% Constant for every 100 models or so: M = X'*X; [l, u, p, q] = lu(M); % Now, I guess this should be quite a bit faster (I might have messed up the order): nwse = sqrt(diag(N.*(q * ( u \ (l \ (p * Q))) * q * (u \ (l \ p)))));

前两个术语是常用的：

l下三角矩阵

u是上三角矩阵

现在， p和q有点不常见。

p是用于获得数值稳定性的行置换矩阵。与稀疏矩阵的[l, u, p] = lu(M)相比[l, u, p] = lu(M)没有太大的性能提升。

然而， q可以显着提高性能。 q是列置换矩阵，用于在进行因式分解时减少填充量。

注意， [l, u, p, q] = lu(M)只是稀疏矩阵的有效语法。

至于为什么使用如上所述的完全旋转应该更快：

请尝试以下方法来查看列置换矩阵q的用途。使用围绕对角线对齐的元素更容易。

S = sprand(100,100,0.01); [l, u, p] = lu(S); spy(l) figure spy(u)

现在，将其与此进行比较：

[ll, uu, pp, qq] = lu(S); spy(ll); figure spy(uu);

^{不幸的是，我现在没有MATLAB，所以我不能保证我把所有的参数都按正确的顺序排列，但我认为这是正确的。}

I believe you can save a lot of computation time by explicitly doing an LU factorization, [l, u, p, q] = lu(X'*X); and use those factors when doing the calculations. Also, since X are constant for about 100 models, pre-calculating X'*X will most likely save you some time.

Note that in your case, the most time demanding operation might very well be the sqrt-function.

The first two terms are commonly used:

l the lower triangular matrix

u the upper triangular matrix

Now, p and q are a bit more uncommon.

p is a row permutation matrix used to obtain numerical stability. There is not much performance gain in doing [l, u, p] = lu(M) compared to [l, u] = lu(M) for sparse matrices.

q however offers a significant performance gain. q is a column permutation matrix that is used to reduce the amount of fill when doing the factorization.

Note that the [l, u, p, q] = lu(M) is only a valid syntax for sparse matrices.

As for why using full pivoting as described above should be faster:

Try the following to see the purpose of the column permutation matrix q. It is easier to work with elements that are aligned around the diagonal.

S = sprand(100,100,0.01); [l, u, p] = lu(S); spy(l) figure spy(u)

Now, compare it with this:

[ll, uu, pp, qq] = lu(S); spy(ll); figure spy(uu);

^{Unfortunately, I don't have MATLAB here right now, so I can't guarantee that I put all the arguments in the correct order, but I think it's correct.}

更多推荐

本文发布于:2023-08-05 04:55:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1427656.html