避免缓存友好的2的权力(Avoiding powers of 2 for cache friendliness)

编程入门行业动态更新时间:2024-10-23 04:51:20

假设在速度至关重要的代码中，我们有一对经常一起使用的数组，其大小并不重要，只需将其设置为合理的数值即可

int a[256], b[256];

这可能是一种悲观，因为低地址位相同可以使缓存难以同时处理两个阵列吗？指定例如300而不是256会更好吗？

Suppose in speed-critical code we have a pair of arrays that are frequently used together, where the exact size doesn't matter, it just needs to be set to something reasonable, e.g.

int a[256], b[256];

Is this potentially a pessimization because the low address bits being the same can make it harder for the cache to handle both arrays simultaneously? Would it be better to specify e.g. 300 instead of 256?

最满意答案

将我的评论移至答案：

你有理由怀疑这两个权力可能有问题。但它通常只适用于你有两个以上的步幅。直到超过L1 缓存关联度才会变得非常糟糕。但即使在此之前，您可能会遇到错误的别名问题。

下面是两个权力实际上成为问题的两个例子：

为什么单独循环中元素相加的速度比组合循环中快得多？矩阵乘法：矩阵大小差异小，时序差异大

在第一个例子中，有4个数组 - 所有这些数组都与4k页的起始位置相同。

在第二个例子中，当尺寸是2的幂乘时，矩阵的逐列跳跃完全破坏了性能。

无论如何，请注意关键概念实际上是数组的对齐，而不是它们的大小。如果你发现你的速度变慢了，只需在你的阵列之间添加一些填充来打破对齐。

Moving my comment to an answer:

You are correct to suspect that powers-of-two could be problematic. But it usually only applies when you have more than 2 strides. It doesn't get really bad until you exceed your L1 cache associativity. But even before that you might run into false aliasing issues.

Here are two examples where powers-of-two actually become problematic:

Why are elementwise additions much faster in separate loops than in a combined loop? Matrix multiplication: Small difference in matrix size, large difference in timings

In the first example, there are 4 arrays - all of which are aligned to the same offset from the start of a 4k page.

In the second example, the column-wise hopping of a matrix completely destroys performance when the size is a power-of-two.

In any case, note that the key concept is actually the alignment of the arrays, not the size of them. If you find that you are running into slow-downs, just add some padding between your arrays to break the alignment.

更多推荐

本文发布于:2023-07-04 10:58:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1023549.html