我想在汇编中做一个imull操作并将结果返回给C.
我的函数的签名是'long long multiplicar(void)',代码是:
multiplicar: movl op1, %eax imull op2, %eax adcl $0, %edx ret我的op2是3.当我的op1是399时运行良好(给出1197)。 但是当我的op1是-399时,我得到了4294966093并且不知道为什么。 我必须使用cdc?
我的op1和op2是长型。 谢谢
I want to do a imull operatation in assembly and return result to C.
The signature of my function is 'long long multiplicar(void)' and the code is:
multiplicar: movl op1, %eax imull op2, %eax adcl $0, %edx retMy op2 is 3. When my op1 is 399 works well (gives 1197). But when my op1 is -399 i get 4294966093 and don't know why. I have to use cdc?
My op1 and op2 are long long types. Thanks
最满意答案
当给定32位操作数时, imul指令执行带符号的32x32位乘法。 这产生高达64位的结果,但是在两个/三个操作数形式中,只有最低有效字保持通过进位指示的溢出。
请注意,进位只是用于错误检测的单位标志,并且不能携带将多个扩展精度乘法链接在一起所需的信息。
在这种情况下,在最新的编辑之后,似乎目标是将两个64位变量相乘并获取截断的64位结果。 使用32x32 => 64位原语实现这一点需要将四次乘法链接到等级学校方法。 那是(a<<32|b) * (c<<32|d) = (a*c<<64) + (a*d<<32) + (b*c<<32) + (b*d<<0) 。 这里可以删除a*c项,因为我们只需要结果的最低64位。
虽然在理论上这在理论上是直截了当的,但保持临时性和直接使用汇编语言是微妙的,容易出错。 一个额外的皱纹是操作是签名的,我的建议是建立一个基本的无符号乘法基元并分别调整符号。
值得庆幸的是 ,如果我们使用8087浮点单元,CPU 本身确实支持64位乘法。 请注意,为避免舍入错误,浮点控制字必须设置为完整的64位精度( _controlfp(_PC_64,_MCW_PC) ),而不是通常使用的53位。
multiply: ;int64_t __cdecl multiply(int64_t lhs, int64_t rhs) fildq 4(%esp) fildq 12(%esp) fmul fistpq 4(%esp) movl 4(%esp),%eax movl 8(%esp),%edx ret但是请注意,要求完全128位精度的溢出不会产生正确截断的64位结果,并且问题不会处理状态溢出。
The imul instruction, when given 32-bit operands, performs a signed 32x32-bit multiplication. This yields a result of up to 64-bits, however in the two/three-operand forms only the least-significant word is kept with overflow indicated through carry.
Note that carry is only a single-bit flag used for error detection and cannot carry the information required to chain several extended-precision multiplications together.
In this case, after the latest edit, it seems to goal is to multiply two 64-bit variables together and grab the truncated 64-bits result. Achieving this with a 32x32=>64 bit primitive requires chaining together four multiplications by what is amounts to the grade-school method. That is (a<<32|b) * (c<<32|d) = (a*c<<64) + (a*d<<32) + (b*c<<32) + (b*d<<0). The a*c term can be dropped here however since we only require the least-significant 64-bits of the result.
While this is straightforward in theory in practice keeping the temporaries and carries straight in assembly language is subtle and error-prone. An added wrinkle is that the operations are signed, for which my suggestion would be to build a basic unsigned multiplication primitive and adjust for the signs separately.
Thankfully the CPU does in fact support 64-bit multiplication natively if we instead use the 8087 floating-point unit. Note that to avoid rounding errors floating-point control word must be set to full 64-bit precision (_controlfp(_PC_64,_MCW_PC)) as opposed to the 53 bits which are typically used.
multiply: ;int64_t __cdecl multiply(int64_t lhs, int64_t rhs) fildq 4(%esp) fildq 12(%esp) fmul fistpq 4(%esp) movl 4(%esp),%eax movl 8(%esp),%edx retNote however that overflows requiring full 128-bit precision not be yield to correctly truncated 64-bits result and question does not state overflow is to be handled.
更多推荐
发布评论