我正在寻找一种方法来找到SSE中的最小值及其在无符号32位整数中的位置(类似于_mm_minpos_epu16)。 我知道我可以通过一系列的_mm_min_epu32和shuffles / shift来找到最小值,但这并没有让我得到这个位置。
有没有人有这么好的方法?
I am looking for a way to find the minimum and its position in SSE for unsigned 32-bit integers (similar to _mm_minpos_epu16). I know I can find the minimum through a series of _mm_min_epu32 and shuffles/shifts but that doesn't get me the position.
Does anyone have any cool ways of doing this?
最满意答案
可能有一种更聪明的方法,但现在这是一种蛮力方法:
#include <stdio.h> #include <smmintrin.h> // SSE4.1 int main(void) { __m128i v = _mm_setr_epi32(42, 1, 43, 2); printf("v = %vlu\n", v); __m128i vmin = v; vmin = _mm_min_epu32(vmin, _mm_alignr_epi8(vmin, vmin, 4)); vmin = _mm_min_epu32(vmin, _mm_alignr_epi8(vmin, vmin, 8)); // get min value in all elements of vmin printf("vmin = %vlu\n", vmin); __m128i vmask = _mm_cmpeq_epi32(v, vmin); // set min element(s) in mask to -1, // all others to 0 [1] printf("vmask = %vld\n", vmask); int16_t mask = _mm_movemask_epi8(vmask); // get mask as scalar [2] printf("mask = %#x\n", mask); int pos = __builtin_ctz(mask) >> 2; // convert scalar mask to index [3] printf("pos = %d\n", pos); return 0; }如果你可以使用在最小元素的位置设置的掩码,那么你可以在[1]处停止,否则继续[3]以获得(最不重要的)最小元素的索引。
还要注意__builtin_ctz是一个特定于gcc的内在函数(虽然它也可以在其他gcc兼容的编译器中找到)。 如果您正在使用MSVC,那么您将需要使用等效的Microsoft内在函数( _BitScanForward )。
There is probably a cleverer method, but for now here's a brute force approach:
#include <stdio.h> #include <smmintrin.h> // SSE4.1 int main(void) { __m128i v = _mm_setr_epi32(42, 1, 43, 2); printf("v = %vlu\n", v); __m128i vmin = v; vmin = _mm_min_epu32(vmin, _mm_alignr_epi8(vmin, vmin, 4)); vmin = _mm_min_epu32(vmin, _mm_alignr_epi8(vmin, vmin, 8)); // get min value in all elements of vmin printf("vmin = %vlu\n", vmin); __m128i vmask = _mm_cmpeq_epi32(v, vmin); // set min element(s) in mask to -1, // all others to 0 [1] printf("vmask = %vld\n", vmask); int16_t mask = _mm_movemask_epi8(vmask); // get mask as scalar [2] printf("mask = %#x\n", mask); int pos = __builtin_ctz(mask) >> 2; // convert scalar mask to index [3] printf("pos = %d\n", pos); return 0; }If you can use a mask which is set at the position(s) of the minimum element(s) then you can just stop at [1], otherwise continue to [3] to get the index of the (least significant) minimum element.
Note also that __builtin_ctz is a gcc-specific intrinsic (although it's found in other gcc-compatible compilers too). If you're using MSVC then you'll need to use the equivalent Microsoft intrinsic (_BitScanForward).
更多推荐
发布评论