程序集ia32上的imull操作(imull operation on assembly ia32)

编程入门行业动态更新时间:2024-10-28 03:28:55

我想在汇编中做一个imull操作并将结果返回给C.

我的函数的签名是'long long multiplicar（void）'，代码是：

multiplicar: movl op1, %eax imull op2, %eax adcl $0, %edx ret

我的op2是3.当我的op1是399时运行良好（给出1197）。但是当我的op1是-399时，我得到了4294966093并且不知道为什么。我必须使用cdc？

我的op1和op2是长型。谢谢

I want to do a imull operatation in assembly and return result to C.

The signature of my function is 'long long multiplicar(void)' and the code is:

multiplicar: movl op1, %eax imull op2, %eax adcl $0, %edx ret

My op2 is 3. When my op1 is 399 works well (gives 1197). But when my op1 is -399 i get 4294966093 and don't know why. I have to use cdc?

My op1 and op2 are long long types. Thanks

最满意答案

当给定32位操作数时， imul指令执行带符号的32x32位乘法。这产生高达64位的结果，但是在两个/三个操作数形式中，只有最低有效字保持通过进位指示的溢出。

请注意，进位只是用于错误检测的单位标志，并且不能携带将多个扩展精度乘法链接在一起所需的信息。

在这种情况下，在最新的编辑之后，似乎目标是将两个64位变量相乘并获取截断的64位结果。使用32x32 => 64位原语实现这一点需要将四次乘法链接到等级学校方法。那是(a<<32|b) * (c<<32|d) = (a*c<<64) + (a*d<<32) + (b*c<<32) + (b*d<<0) 。这里可以删除a*c项，因为我们只需要结果的最低64位。

虽然在理论上这在理论上是直截了当的，但保持临时性和直接使用汇编语言是微妙的，容易出错。一个额外的皱纹是操作是签名的，我的建议是建立一个基本的无符号乘法基元并分别调整符号。

值得庆幸的是，如果我们使用8087浮点单元，CPU 本身确实支持64位乘法。请注意，为避免舍入错误，浮点控制字必须设置为完整的64位精度（ _controlfp(_PC_64,_MCW_PC) ），而不是通常使用的53位。

multiply: ;int64_t __cdecl multiply(int64_t lhs, int64_t rhs) fildq 4(%esp) fildq 12(%esp) fmul fistpq 4(%esp) movl 4(%esp),%eax movl 8(%esp),%edx ret

但是请注意，要求完全128位精度的溢出不会产生正确截断的64位结果，并且问题不会处理状态溢出。

The imul instruction, when given 32-bit operands, performs a signed 32x32-bit multiplication. This yields a result of up to 64-bits, however in the two/three-operand forms only the least-significant word is kept with overflow indicated through carry.

Note that carry is only a single-bit flag used for error detection and cannot carry the information required to chain several extended-precision multiplications together.

In this case, after the latest edit, it seems to goal is to multiply two 64-bit variables together and grab the truncated 64-bits result. Achieving this with a 32x32=>64 bit primitive requires chaining together four multiplications by what is amounts to the grade-school method. That is (a<<32|b) * (c<<32|d) = (a*c<<64) + (a*d<<32) + (b*c<<32) + (b*d<<0). The a*c term can be dropped here however since we only require the least-significant 64-bits of the result.

While this is straightforward in theory in practice keeping the temporaries and carries straight in assembly language is subtle and error-prone. An added wrinkle is that the operations are signed, for which my suggestion would be to build a basic unsigned multiplication primitive and adjust for the signs separately.

Thankfully the CPU does in fact support 64-bit multiplication natively if we instead use the 8087 floating-point unit. Note that to avoid rounding errors floating-point control word must be set to full 64-bit precision (_controlfp(_PC_64,_MCW_PC)) as opposed to the 53 bits which are typically used.

multiply: ;int64_t __cdecl multiply(int64_t lhs, int64_t rhs) fildq 4(%esp) fildq 12(%esp) fmul fistpq 4(%esp) movl 4(%esp),%eax movl 8(%esp),%edx ret

Note however that overflows requiring full 128-bit precision not be yield to correctly truncated 64-bits result and question does not state overflow is to be handled.

更多推荐

本文发布于:2023-07-30 01:24:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1321356.html