计算地板和地板使用SSE4之前的vector2的双倍细胞数

编程入门 行业动态 更新时间:2024-10-11 19:22:54
本文介绍了计算地板和地板使用SSE4之前的vector2的双倍细胞数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

这可以通过sse4.1内部函数 _mm_floor_pd 和 _mm_ceil_pd 完成转换为 roundpd xmm,xmm,1 和 roundpd xmm,xmm,2

This can be done with sse4.1 intrinsics _mm_floor_pd and _mm_ceil_pd which translate into roundpd xmm,xmm,1 and roundpd xmm,xmm,2

使用 SSE/SSE2/SSE3 的最佳计算方法是什么?

What is the optimum way to calculate using SSE/SSE2/SSE3?

推荐答案

以下是在SSE4.1之前的CPU上执行上下限计算的代码.请注意,使用'-ffast-math'会破坏它!

Here is the code that do floor/ceil on pre SSE4.1 CPU. Please note that using '-ffast-math' will break it!

#include <cmath> #include <emmintrin.h> #include <cstdio> // for printf #ifdef _MSC_VER #define __attribute__(P) #endif struct vec2d { double x; double y; }; static __m128d mm_blendv_pd(const __m128d& a, const __m128d& b, const __m128d& mask) noexcept { return _mm_or_pd(_mm_andnot_pd(mask, a), _mm_and_pd(b, mask)); } __attribute__((optimize("-fno-associative-math"))) vec2d _floor(vec2d v) noexcept { __m128d src = _mm_set_pd(v.x, v.y); __m128d maxn = _mm_set_pd(4503599627370496.0, 4503599627370496.0); // pow(2, 52) __m128d magic = _mm_set_pd(6755399441055744.0, 6755399441055744.0); // pow(2, 52) + pow(2, 51) __m128d msk = _mm_cmpnlt_pd(src, maxn); __m128d rounded = _mm_sub_pd(_mm_add_pd(src, magic), magic); //! -ffast-math will break this! __m128d maybeone = _mm_and_pd(_mm_cmplt_pd(src, rounded), _mm_set_pd(1.0, 1.0)); __m128d res = mm_blendv_pd(_mm_sub_pd(rounded, maybeone), src, msk); return {_mm_cvtsd_f64(_mm_unpackhi_pd(res, res)), _mm_cvtsd_f64(res)}; } __attribute__((optimize("-fno-associative-math"))) vec2d _ceil(vec2d v) noexcept { __m128d src = _mm_set_pd(v.x, v.y); __m128d maxn = _mm_set_pd(4503599627370496.0, 4503599627370496.0); // pow(2, 52) __m128d magic = _mm_set_pd(6755399441055744.0, 6755399441055744.0); // pow(2, 52) + pow(2, 51) __m128d msk = _mm_cmpnlt_pd(src, maxn); __m128d rounded = _mm_sub_pd(_mm_add_pd(src, magic), magic); //! -ffast-math will break this! __m128d maybeone = _mm_and_pd(_mm_cmpnle_pd(src, rounded), _mm_set_pd(1.0, 1.0)); __m128d res = mm_blendv_pd(_mm_add_pd(rounded, maybeone), src, msk); return {_mm_cvtsd_f64(_mm_unpackhi_pd(res, res)), _mm_cvtsd_f64(res)}; } int main() { vec2d v{3.1,4.6}; vec2d v2 = _floor(v); vec2d v3 = _ceil(v); printf(" v2: %f %f\n",v2.x,v2.y); printf(" v3: %f %f\n",v3.x,v3.y); return 0; }

实时代码

它基于此博客文章 X86上的C ++编译器和FP四舍五入,但其中的代码存在错误.

It is based on this blog post C++ Compilers and FP Rounding on X86 , but code there has bugs.

更多推荐

计算地板和地板使用SSE4之前的vector2的双倍细胞数

本文发布于:2023-11-12 21:34:28,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1582584.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:地板   双倍   细胞

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!