而不是仅仅的最低设置位,我想找到 N 的日最低设置位的位置。 (我的不可以谈论价值的 N 个位的位置)
Instead of just the lowest set bit, I want to find the position of the nth lowest set bit. (I'm NOT talking about value on the nth bit position)
例如,说我有:结果 0000 1101 1000 0100 1100 1000 1010 0000
和我想找到所设置的第4位。然后我想它返回:结果 0000 0000 0000 0000 0100 0000 0000 0000
And I want to find the 4th bit that is set. Then I want it to return: 0000 0000 0000 0000 0100 0000 0000 0000
如果 POPCNT(V)LT; ñ,它将使意义,如果这个函数返回 0 ,但这种情况下的行为是可以接受的我。
If popcnt(v) < n, it would make sense if this function returned 0, but any behavior for this case is acceptable for me.
我在寻找的东西比如果可能的话一个循环更快。
I'm looking for something faster than a loop if possible.
推荐答案事实证明,它的确是可能的,没有循环来做到这一点。它是最快precompute这个问题的(至少)8位版本。当然,这些表使用了缓存空间,但仍然应该在几乎所有的现代PC的情景净加速。在这个code,N = 0的回报至少设置位,N = 1秒到至少等。
It turns out that it is indeed possible to do this with no loops. It is fastest to precompute the (at least) 8 bit version of this problem. Of course, these tables use up cache space, but there should still be a net speedup in virtually all modern pc scenarios. In this code, n=0 returns the least set bit, n=1 is second-to-least, etc.
解决方案与__popcnt
有是使用__popcnt内在溶液(你需要__popcnt是极快或通过一个简单的循环解决方案的任何PERF涨势将变得毫无意义。幸运的是大多数SSE4 +时代的处理器支持它)。
There is a solution using the __popcnt intrinsic (you need __popcnt to be extremely fast or any perf gains over a simple loop solution will be moot. Fortunately most SSE4+ era processors support it).
// lookup table for sub-problem: 8-bit v byte PRECOMP[256][8] = { .... } // PRECOMP[v][n] for v < 256 and n < 8 ulong nthSetBit(ulong v, ulong n) { ulong p = __popcnt(v & 0xFFFF); ulong shift = 0; if (p <= n) { v >>= 16; shift += 16; n -= p; } p = __popcnt(v & 0xFF); if (p <= n) { shift += 8; v >>= 8; n -= p; } if (n >= 8) return 0; // optional safety, in case n > # of set bits return PRECOMP[v & 0xFF][n] << shift; }这说明了分而治之是如何工作的方法。
This illustrates how the divide and conquer approach works.
常规的解决方案
有也是没有__popcnt一般architectures-的解决方案。它可以通过处理在8位的块来完成。你需要多一个查找表,告诉你一个字节的POPCNT:
There is also a solution for "general" architectures- without __popcnt. It can be done by processing in 8-bit chunks. You need one more lookup table that tells you the popcnt of a byte:
byte PRECOMP[256][8] = { .... } // PRECOMP[v][n] for v<256 and n < 8 byte POPCNT[256] = { ... } // POPCNT[v] is the number of set bits in v. (v < 256) ulong nthSetBit(ulong v, ulong n) { ulong p = POPCNT[v & 0xFF]; ulong shift = 0; if (p <= n) { n -= p; v >>= 8; shift += 8; p = POPCNT[v & 0xFF]; if (p <= n) { n -= p; shift += 8; v >>= 8; p = POPCNT[v & 0xFF]; if (p <= n) { n -= p; shift += 8; v >>= 8; } } } if (n >= 8) return 0; // optional safety, in case n > # of set bits return PRECOMP[v & 0xFF][n] << shift; }这可以,当然,可以用一个循环中完成,但展开形式是更快和循环的不寻常的形式将使它不可能的编译器能够自动展开它。
This could, of course, be done with a loop, but the unrolled form is faster and the unusual form of the loop would make it unlikely that the compiler could automatically unroll it for you.
更多推荐
查找第n个位设定在一个int
发布评论