FPTAN示例x86

编程入门行业动态更新时间:2024-10-28 15:32:39

本文介绍了FPTAN示例x86的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

根据Intel文档，这是FPTAN的作用:

According to Intel documentation, this is what FPTAN does:

用近似切线替换ST(0)并将1推到FPU堆栈上.

Replace ST(0) with its approximate tangent and push 1 onto the FPU stack.

这是我在NASM中编写的代码:

And this is a code I wrote in NASM:

section .data fVal: dd 4 fSt0: dq 0.0 fSt1: dq 0.0 section .text fldpi fdiv dword[fVal] ; divide pi by 4 and store result in ST(0). fptan fstp qword[fSt0] ; store ST(0) fstp qword[fSt1] ; store ST(1)

此时，我发现fSt0和fSt1的值是:

At this point the values of fSt0 and fSt1, I find are:

fSt0 = 5.60479e+044 fSt1 = -1.#IND

但是，fSt0和fSt1都不应该都是1吗?

But, shouldn't fSt0 and fSt1 be both 1?

推荐答案

正如Michael Petch在评论中指出的那样，您有一个简单的错字.您不必将fVal声明为浮点值(按预期方式)，而是将其声明为32位整数.更改:

As Michael Petch has already pointed out in a comment, you have a simple typo. Instead of declaring fVal as a floating-point value (as intended), you declared it as a 32-bit integer. Change:

fVal: dd 4

收件人:

fVal: dd 4.0

然后您的代码将按预期工作.正确书写.

Then your code will work as intended. It is correctly written.

如果想要接受整数输入，则可以通过将代码更改为使用FIDIV指令来实现.该指令将首先将整数转换为双精度浮点值，然后进行除法:

If you wanted to take an integer input, you could do it by changing your code to use the FIDIV instruction. This instruction will first convert an integer to a double-precision floating-point value, and then do the divide:

fldpi fidiv dword [fVal] ; st(0) = pi / fVal fptan ; st(0) = tan(st(0)) ; st(1) = 1.0 fstp qword [fSt0] fstp qword [fSt1]

但是，由于需要进行转换，因此，与仅将输入作为浮点值提供的情况相比，效率要低一些.

But because the conversion is required, this is slightly less efficient than if you had just given the input as a floating-point value.

请注意，如果要执行此操作，则在某些较旧的CPU上分散负载会更有效，这样就可以与分区( eg ，

Note that, if you were going to do this, it would be more efficient on certain older CPUs to break up the load so that it was done separately from the division—e.g.,

fldpi fild dword [fVal] fdivp st(1), st(0) ; st(0) = pi / fVal fptan ; st(0) = tan(st(0)) ; st(1) = 1.0 fstp qword [fSt0] fstp qword [fSt1]

换句话说，我们将FIDIV指令分解为单独的FILD(整数加载)和FDIVP(分频弹出)指令.这改善了重叠，从而减少了代码执行速度的几个时钟周期. (在AMD系列15h [Bulldozer]和Intel Pentium II及更高版本的较新CPU上，将FIDIV分解为FILD + FDIV没有真正的优势；无论用哪种方式编写，它都应具有相同的性能.)

In other words, we break the FIDIV instruction apart into separate FILD (integer load) and FDIVP (divide-and-pop) instructions. This improves overlapping, and thus shaves off a couple of clock cycles from the execution speed of the code. (On newer CPUs, from AMD Family 15h [Bulldozer] and Intel Pentium II and later—there's no real advantage to breaking up FIDIV into FILD+FDIV; either way you write it should be equally performant.)

当然，由于这里的所有内容都是常量和tan(pi/4) == 1，因此您的代码等效于:

Of course, since everything you have here is a constant, and tan(pi/4) == 1, your code is equivalent to:

fld1 fld1

…这是优化编译器将生成的内容. :-)

…which is what an optimizing compiler would generate. :-)

更多推荐

FPTAN示例x86

本文发布于:2023-11-15 21:47:42，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1598239.html