为什么比for循环快检查memcmp这么多?

编程入门 行业动态 更新时间:2024-10-26 15:18:42
本文介绍了为什么比for循环快检查memcmp这么多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

为什么 memcmp(A,B,大小)所以比快得多:

for(i = 0; i < nelements; i++) { if a[i] != b[i] return 0; } return 1;

时memcmp CPU的指令还是什么?它必须是pretty深,因为我使用了 memcmp 在循环大规模加速。

推荐答案

memcmp ,并且往往是在组装实施采取了一些具体的架构功能优势,它可以使的多的比用C一个简单的循环速度更快。

memcmp, and is often implemented in assembly to take advantage of a number of architecture-specific features, which can make it much faster than a simple loop in C.

GCC支持 memcmp (以及大量的其他功能),为的 建宏的。在GCC的一些版本/配置,以 memcmp 通话将被识别为 __ builtin_memcmp 。相反,散发出通话到 memcmp 库函数,GCC会发出指令少数作为一个优化的直列版本的功能。

GCC supports memcmp (as well as a ton of other functions) as builtins. In some versions / configurations of GCC, a call to memcmp will be recognized as __builtin_memcmp. Instead of emitting a call to the memcmp library function, GCC will emit a handful of instructions to act as an optimized inline version of the function.

在x86上,这利用了 CMPSB 指令,该指令字节的字符串在一个存储位置比较到另一个。这再加上 REPE preFIX,所以比较字符串,直到他们不再是平等的,或计数用尽。 (究竟是什么 memcmp 一样)。

On x86, this leverages the use of the cmpsb instruction, which compares a string of bytes at one memory location to another. This is coupled with the repe prefix, so the strings are compared until they are no longer equal, or a count is exhausted. (Exactly what memcmp does).

由于以下code:

int test(const void* s1, const void* s2, int count) { return memcmp(s1, s2, count) == 0; }

gcc版本3.4.4 在Cygwin生成以下组件:

gcc version 3.4.4 on Cygwin generates the following assembly:

; (prologue) mov esi, [ebp+arg_0] ; Move first pointer to esi mov edi, [ebp+arg_4] ; Move second pointer to edi mov ecx, [ebp+arg_8] ; Move length to ecx cld ; Clear DF, the direction flag, so comparisons happen ; at increasing addresses cmp ecx, ecx ; Special case: If length parameter to memcmp is ; zero, don't compare any bytes. repe cmpsb ; Compare bytes at DS:ESI and ES:EDI, setting flags ; Repeat this while equal ZF is set setz al ; Set al (return value) to 1 if ZF is still set ; (all bytes were equal). ; (epilogue)

参考:

  • CMPSB 指令

高度优化的 memcmp 的版本在许多C标准库存在。这些通常会利用特殊结构的指令与大量数据并行工作。

Highly-optimized versions of memcmp exist in many C standard libraries. These will usually take advantage of architecture-specific instructions to work with lots of data in parallel.

glibc中,有 memcmp 的为x86_64的,可以采取下面的指令集扩展的优势:

In Glibc, there are versions of memcmp for x86_64 that can take advantage of the following instruction set extensions:

  • SSE2 - 的 sysdeps / x86_64的/ memcmp.S
  • SSE4 - 的 sysdeps / x86_64的/ multiarch / memcmp-sse4.S
  • SSSE3 - 的 sysdeps / x86_64的/ multiarch / memcmp-ssse3.S
  • SSE2 - sysdeps/x86_64/memcmp.S
  • SSE4 - sysdeps/x86_64/multiarch/memcmp-sse4.S
  • SSSE3 - sysdeps/x86_64/multiarch/memcmp-ssse3.S

凉爽的部分是,glibc就检测(在运行时)的最新指令集你的CPU有,并执行其优化的版本。看到这个片段从 sysdeps / x86_64的/ multiarch / memcmp.S :

The cool part is that glibc will detect (at run-time) the newest instruction set your CPU has, and execute the version optimized for it. See this snippet from sysdeps/x86_64/multiarch/memcmp.S:

ENTRY(memcmp) .type memcmp, @gnu_indirect_function LOAD_RTLD_GLOBAL_RO_RDX HAS_CPU_FEATURE (SSSE3) jnz 2f leaq __memcmp_sse2(%rip), %rax ret 2: HAS_CPU_FEATURE (SSE4_1) jz 3f leaq __memcmp_sse4_1(%rip), %rax ret 3: leaq __memcmp_ssse3(%rip), %rax ret END(memcmp)

在Linux内核

的Linux似乎并不具有 memcmp 为x86_64的优化版本,但它确实为的memcpy ,在 弓/ 86 / lib中/ memcpy_64 .S 。请注意,是使用的替代的基础设施(的 弓/ 86 /内核/ alternative.c )不仅决定在运行时使用的版本,但实际上补丁本身仅在启动时曾经做出这个决定。

In the Linux kernel

Linux does not seem to have an optimized version of memcmp for x86_64, but it does for memcpy, in arch/x86/lib/memcpy_64.S. Note that is uses the alternatives infrastructure (arch/x86/kernel/alternative.c) for not only deciding at runtime which version to use, but actually patching itself to only make this decision once at boot-up.

更多推荐

为什么比for循环快检查memcmp这么多?

本文发布于:2023-10-15 12:06:35,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1494291.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:这么多   memcmp

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!