手工自动矢量化与矢量化代码(Auto

编程入门行业动态更新时间:2024-10-22 20:30:25

手工自动矢量化与矢量化代码(Auto-vectorizing vs. vectorized code by hand)

在某种意义上，手动矢量化代码，使用显式编译指示还是依赖或使用自动矢量化会更好吗？为了使用自动矢量化获得最佳性能，必须监视编译器输出以确保循环正在被矢量化或修改它们直到它们是可矢量化的。

通过手动编码，人们可以确定正在发送所需的指令，但现在代码可能不可移植（无论是其他架构还是其他编译器）。

Is it better in some sense to vectorize code by hand, using explicit pragmas or to rely on or use auto-vectorization? For optimum performance using auto-vectorization, one would have to monitor the compiler output to ensure that loops are being vectorized or modify them until they are vectorizable.

With hand coding, one is certain that the desired instructions are being emitted, but now the code is likely not portable (either to other architectures or other compilers).

最满意答案

自动矢量化对我来说从来没有效果好。对我来说，似乎自动矢量化只适用于目前非常平凡的循环。

我使用pragma / intrinsic方法并查看程序集。如果编译器生成错误代码（如将SSE注册表溢出到堆栈或添加冗余移动），则对整个循环体使用内联汇编器。

可移植性不是什么问题。通常你从一个C / C ++循环开始，并使用内在函数对其进行优化。只需保留旧的循环并将其用作SIMD实现的单元测试/回退。此外，通过编译时定义能够从项目中删除所有SIMD代码总是明智的。以这种方式调试应用程序要容易得多。相同的定义可以用于交叉编译。

Auto vectorization never worked out well for me. To me it seems like auto-vectorization only works for very trivial loops at the moment.

I use the pragma/intrinsic approach and take a look at the assembly. If the compiler generates bad code (like spilling SSE registes onto the stack or adding redundant moves) I use inline assembler for the whole loop body.

Portability is btw not a problem. Often you start with a C/C++ loop and optimize it using intrinsics. Just keep the old loop and use it as a unit-test / fallback for your SIMD implementation. Also it's always wise to be able to remove all SIMD code from a project via a compile-time define. Debugging an application is much easier that way. The same define can be used for cross-compilation.

更多推荐

本文发布于:2023-07-04 10:59:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1023552.html