添加128位xmm寄存器的高64位和低64位(Add the upper and lower 64

编程入门 行业动态 更新时间:2024-10-27 08:39:49
添加128位xmm寄存器的高64位和低64位(Add the upper and lower 64-bits of a 128-bit xmm register)

我在xmm0有两个打包的四字整数,我需要将它们一起添加并将结果存储在内存位置。 我可以保证每个整数的值小于2 ^ 15 。 现在,我正在做以下事情:

int temp; .... movdq2q mm0, xmm0 psrldq xmm0, 8 movdq2q mm1, xmm0 paddq mm0,mm1 movd temp, mm0

有一个更好的方法吗?

I have two packed quadword integers in xmm0 and I need to add them together and store the result in a memory location. I can guarantee that the value of the each integer is less than 2^15. Right now, I'm doing the following:

int temp; .... movdq2q mm0, xmm0 psrldq xmm0, 8 movdq2q mm1, xmm0 paddq mm0,mm1 movd temp, mm0

Is there a better way to do this?

最满意答案

首先,为什么使用四字来表示适合16位格式的值? 除此之外,还有一些解决方案:

pshufd xmm1, xmm0, EEh paddq xmm0, xmm1 movd temp, xmm0

要么

movdqa xmm1, xmm0 psrldq xmm1, 8 paddq xmm0, xmm1 movd temp, xmm0

要么

movhlps xmm1, xmm0 paddq xmm0, xmm1 movd temp, xmm0

请注意,您实际上并不需要使用paddq ,如果您愿意,可以使用较窄的添加之一。

编辑总结四个双四字 - 你有什么是非常好的。 鉴于您知道它们中的所有数据都适合每个插槽的低双字,您可以尝试以下方法:

shufps xmm0, xmm2, 88h shufps xmm4, xmm6, 88h paddd xmm0, xmm4 psrlq xmm1, xmm0, 32 paddd xmm0, xmm1 movhlps xmm1, xmm0 paddd xmm0, xmm0 movd temp, xmm0

可能会或可能不会更快。

至于EMMS,它只是另一条指令。 在任何接触MMX寄存器的代码之后,在使用x87浮点指令的任何代码之前,您需要拥有emms 。

First off, why are you using quadwords to represent values that would fit in a 16-bit format? Leaving that aside, a couple solutions:

pshufd xmm1, xmm0, EEh paddq xmm0, xmm1 movd temp, xmm0

or

movdqa xmm1, xmm0 psrldq xmm1, 8 paddq xmm0, xmm1 movd temp, xmm0

or

movhlps xmm1, xmm0 paddq xmm0, xmm1 movd temp, xmm0

Note that you don't actually need to use paddq, you can get away with one of the narrower adds if you prefer.

edit summing four double quadwords -- what you have is pretty much fine. Given that you know that all the data in them fits into the low doubleword of each slot, you could try something like:

shufps xmm0, xmm2, 88h shufps xmm4, xmm6, 88h paddd xmm0, xmm4 psrlq xmm1, xmm0, 32 paddd xmm0, xmm1 movhlps xmm1, xmm0 paddd xmm0, xmm0 movd temp, xmm0

which may or may not prove to be faster.

As for EMMS, it's just another instruction. After any code that touches the MMX registers, before any code that uses the x87 floating-point instructions you need to have emms.

更多推荐

本文发布于:2023-08-07 21:37:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1466378.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:寄存器   xmm   upper   Add

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!