在每个功能/每个代码块的基础上启用SSE4的正确方法?

编程入门 行业动态 更新时间:2024-10-11 21:25:32
本文介绍了在每个功能/每个代码块的基础上启用SSE4的正确方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

对于我的OS X程序之一,我有一些使用SSE4.1指令的优化案例.在仅SSE3的计算机上,运行了未优化的分支:

For one of my OS X programs, I have a few optimized cases which use SSE4.1 instructions. On SSE3-only machines, the non-optimized branch is ran:

// SupportsSSE4_1 returns true on CPUs that support SSE4.1, false otherwise if (SupportsSSE4_1()) { // Code that uses _mm_dp_ps, an SSE4 instruction ... __m128 hDelta = _mm_sub_ps(here128, right128); __m128 vDelta = _mm_sub_ps(here128, down128); hDelta = _mm_sqrt_ss(_mm_dp_ps(hDelta, hDelta, 0x71)); vDelta = _mm_sqrt_ss(_mm_dp_ps(vDelta, vDelta, 0x71)); ... } else { // Equivalent code that uses SSE3 instructions ... }

为了编译以上内容,我必须将CLANG_X86_VECTOR_INSTRUCTIONS设置为sse4.1.

In order to get the above to compile, I had to set CLANG_X86_VECTOR_INSTRUCTIONS to sse4.1.

但是,这似乎告诉clang可以在程序的任何地方使用ROUNDSD指令.因此,该程序在具有SIGILL: ILL_ILLOPC的仅SSE3的计算机上崩溃.

However, this seems to instruct clang that it's ok to use the ROUNDSD instruction anywhere in my program. Hence, the program is crashing on SSE3-only machines with SIGILL: ILL_ILLOPC.

仅对SupportsSSE4_1() if块的true分支内的代码行启用SSE4.1的最佳实践是什么?

What's the best practice for enabling SSE4.1 for just the lines the code inside of true branch of the SupportsSSE4_1() if block?

推荐答案

目前尚无办法以clang中的块/函数粒度定位不同的ISA扩展.您只能按 file 粒度进行操作(将SSE4.1代码放入单独的文件中,并指定该文件以使用-msse4.1).如果这对您来说是一项重要功能,请提交错误报告以请求该功能!

There is currently no way to target different ISA extensions at block / function granularity in clang. You can only do it at file granularity (put your SSE4.1 code into a separate file and specify that file to use -msse4.1). If this is an important feature for you, please file a bug report to request it!

但是,我应该指出,DPPS的实际好处在大多数实际情况下都是很小的(使用DPPS甚至会减慢某些代码序列的速度!).除非这个特定的代码序列很关键,并且您已经仔细测量了使用DPPS的效果,否则即使具有该编译器功能,也不必为SSE4.1的特殊情况而烦恼.

However, I should note that the actually benefit of DPPS is pretty small in most real scenarios (and using DPPS even slows down some code sequences!). Unless this particular code sequence is critical, and you have carefully measured the effect of using DPPS, it may not be worth the hassle to special case for SSE4.1 even if that compiler feature is available.

更多推荐

在每个功能/每个代码块的基础上启用SSE4的正确方法?

本文发布于:2023-11-12 21:34:44,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1582585.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:基础上   正确   代码   功能   方法

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!