为什么 memcmp 比 for 循环检查快这么多?

编程入门 行业动态 更新时间:2024-10-26 17:23:03
本文介绍了为什么 memcmp 比 for 循环检查快这么多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

为什么 memcmp(a, b, size) 比:

for(i = 0; i < nelements; i++) { if a[i] != b[i] return 0; } return 1;

memcmp 是 CPU 指令还是什么?它一定很深,因为我在循环中使用 memcmp 获得了巨大的加速.

Is memcmp a CPU instruction or something? It must be pretty deep because I got a massive speedup using memcmp over the loop.

推荐答案

memcmp 通常在汇编中实现以利用许多特定于体系结构的功能,这可以使它很多 比 C 中的简单循环快.

memcmp is often implemented in assembly to take advantage of a number of architecture-specific features, which can make it much faster than a simple loop in C.

GCC 支持 memcmp(以及大量其他功能)作为 内置.在 GCC 的某些版本/配置中,对 memcmp 的调用将被识别为 __builtin_memcmp.GCC 不会向memcmp 库函数发出call,而是会发出一些指令来充当该函数的优化内联版本.

GCC supports memcmp (as well as a ton of other functions) as builtins. In some versions / configurations of GCC, a call to memcmp will be recognized as __builtin_memcmp. Instead of emitting a call to the memcmp library function, GCC will emit a handful of instructions to act as an optimized inline version of the function.

在 x86 上,这利用了 cmpsb 指令的使用,该指令将一个内存位置的字节串与另一个进行比较.这与 repe 前缀相结合,因此将比较字符串,直到它们不再相等,或者计数用完为止.(正是 memcmp 所做的).

On x86, this leverages the use of the cmpsb instruction, which compares a string of bytes at one memory location to another. This is coupled with the repe prefix, so the strings are compared until they are no longer equal, or a count is exhausted. (Exactly what memcmp does).

给定以下代码:

int test(const void* s1, const void* s2, int count) { return memcmp(s1, s2, count) == 0; }

Cygwin 上的

gcc version 3.4.4 生成以下程序集:

gcc version 3.4.4 on Cygwin generates the following assembly:

; (prologue) mov esi, [ebp+arg_0] ; Move first pointer to esi mov edi, [ebp+arg_4] ; Move second pointer to edi mov ecx, [ebp+arg_8] ; Move length to ecx cld ; Clear DF, the direction flag, so comparisons happen ; at increasing addresses cmp ecx, ecx ; Special case: If length parameter to memcmp is ; zero, don't compare any bytes. repe cmpsb ; Compare bytes at DS:ESI and ES:EDI, setting flags ; Repeat this while equal ZF is set setz al ; Set al (return value) to 1 if ZF is still set ; (all bytes were equal). ; (epilogue)

参考:

  • cmpsb 说明

memcmp 的高度优化版本存在于许多 C 标准库中.这些通常会利用特定于架构的指令并行处理大量数据.

Highly-optimized versions of memcmp exist in many C standard libraries. These will usually take advantage of architecture-specific instructions to work with lots of data in parallel.

在 Glibc 中,有多个版本的 memcmp for x86_64 可以利用以下指令集扩展:

In Glibc, there are versions of memcmp for x86_64 that can take advantage of the following instruction set extensions:

  • SSE2 - sysdeps/x86_64/memcmp.S
  • SSE4 - sysdeps/x86_64/multiarch/memcmp-sse4.S
  • SSSE3 - sysdeps/x86_64/multiarch/memcmp-ssse3.S一个>

很酷的部分是 glibc 将检测(在运行时)您的 CPU 具有的最新指令集,并执行为其优化的版本.请参阅 sysdeps/x86_64/multiarch/memcmp.S:

The cool part is that glibc will detect (at run-time) the newest instruction set your CPU has, and execute the version optimized for it. See this snippet from sysdeps/x86_64/multiarch/memcmp.S:

ENTRY(memcmp) .type memcmp, @gnu_indirect_function LOAD_RTLD_GLOBAL_RO_RDX HAS_CPU_FEATURE (SSSE3) jnz 2f leaq __memcmp_sse2(%rip), %rax ret 2: HAS_CPU_FEATURE (SSE4_1) jz 3f leaq __memcmp_sse4_1(%rip), %rax ret 3: leaq __memcmp_ssse3(%rip), %rax ret END(memcmp)

在 Linux 内核中

Linux 似乎没有针对 x86_64 的 memcmp 优化版本,但它有针对 memcpy 的优化版本,位于 arch/x86/lib/memcpy_64.S.请注意,它使用了alternatives 基础结构(arch/x86/kernel/alternative.c) 不仅在运行时决定使用哪个版本,而且实际上修补自己只做出这个决定启动时一次.

In the Linux kernel

Linux does not seem to have an optimized version of memcmp for x86_64, but it does for memcpy, in arch/x86/lib/memcpy_64.S. Note that is uses the alternatives infrastructure (arch/x86/kernel/alternative.c) for not only deciding at runtime which version to use, but actually patching itself to only make this decision once at boot-up.

更多推荐

为什么 memcmp 比 for 循环检查快这么多?

本文发布于:2023-10-15 12:06:48,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1494292.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:这么多   memcmp

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!