SSE中的水平最小值和位置,用于无符号32位整数(Horizontal minimum and position in SSE for unsigned 32

编程入门 行业动态 更新时间:2024-10-25 18:28:22
SSE中的水平最小值和位置,用于无符号32位整数(Horizontal minimum and position in SSE for unsigned 32-bit integers)

我正在寻找一种方法来找到SSE中的最小值及其在无符号32位整数中的位置(类似于_mm_minpos_epu16)。 我知道我可以通过一系列的_mm_min_epu32和shuffles / shift来找到最小值,但这并没有让我得到这个位置。

有没有人有这么好的方法?

I am looking for a way to find the minimum and its position in SSE for unsigned 32-bit integers (similar to _mm_minpos_epu16). I know I can find the minimum through a series of _mm_min_epu32 and shuffles/shifts but that doesn't get me the position.

Does anyone have any cool ways of doing this?

最满意答案

可能有一种更聪明的方法,但现在这是一种蛮力方法:

#include <stdio.h>
#include <smmintrin.h> // SSE4.1

int main(void)
{
    __m128i v = _mm_setr_epi32(42, 1, 43, 2);

    printf("v     = %vlu\n", v);

    __m128i vmin = v;

    vmin = _mm_min_epu32(vmin, _mm_alignr_epi8(vmin, vmin, 4));
    vmin = _mm_min_epu32(vmin, _mm_alignr_epi8(vmin, vmin, 8));
                                                   // get min value in all elements of vmin

    printf("vmin  = %vlu\n", vmin);

    __m128i vmask = _mm_cmpeq_epi32(v, vmin);      // set min element(s) in mask to -1,
                                                   // all others to 0 [1]

    printf("vmask = %vld\n", vmask);

    int16_t mask = _mm_movemask_epi8(vmask);       // get mask as scalar [2]

    printf("mask  = %#x\n", mask);

    int pos = __builtin_ctz(mask) >> 2;            // convert scalar mask to index [3]

    printf("pos   = %d\n", pos);

    return 0;
}
 

如果你可以使用在最小元素的位置设置的掩码,那么你可以在[1]处停止,否则继续[3]以获得(最不重要的)最小元素的索引。

还要注意__builtin_ctz是一个特定于gcc的内在函数(虽然它也可以在其他gcc兼容的编译器中找到)。 如果您正在使用MSVC,那么您将需要使用等效的Microsoft内在函数( _BitScanForward )。

There is probably a cleverer method, but for now here's a brute force approach:

#include <stdio.h>
#include <smmintrin.h> // SSE4.1

int main(void)
{
    __m128i v = _mm_setr_epi32(42, 1, 43, 2);

    printf("v     = %vlu\n", v);

    __m128i vmin = v;

    vmin = _mm_min_epu32(vmin, _mm_alignr_epi8(vmin, vmin, 4));
    vmin = _mm_min_epu32(vmin, _mm_alignr_epi8(vmin, vmin, 8));
                                                   // get min value in all elements of vmin

    printf("vmin  = %vlu\n", vmin);

    __m128i vmask = _mm_cmpeq_epi32(v, vmin);      // set min element(s) in mask to -1,
                                                   // all others to 0 [1]

    printf("vmask = %vld\n", vmask);

    int16_t mask = _mm_movemask_epi8(vmask);       // get mask as scalar [2]

    printf("mask  = %#x\n", mask);

    int pos = __builtin_ctz(mask) >> 2;            // convert scalar mask to index [3]

    printf("pos   = %d\n", pos);

    return 0;
}
 

If you can use a mask which is set at the position(s) of the minimum element(s) then you can just stop at [1], otherwise continue to [3] to get the index of the (least significant) minimum element.

Note also that __builtin_ctz is a gcc-specific intrinsic (although it's found in other gcc-compatible compilers too). If you're using MSVC then you'll need to use the equivalent Microsoft intrinsic (_BitScanForward).

更多推荐

本文发布于:2023-08-07 12:10:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1464280.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:整数   符号   最小值   水平   位置

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!