使用 GCC 5.3 以下代码与 -O3 -fma
With GCC 5.3 the following code compield with -O3 -fma
float mul_add(float a, float b, float c) { return a*b + c; }产生以下组件
vfmadd132ss %xmm1, %xmm2, %xmm0 ret我注意到 GCC 使用 -O3 已经在 GCC 4.8 中这样做了.
I noticed GCC doing this with -O3 already in GCC 4.8.
带有 -O3 -mfma 的 Clang 3.7 生成
Clang 3.7 with -O3 -mfma produces
vmulss %xmm1, %xmm0, %xmm0 vaddss %xmm2, %xmm0, %xmm0 retq但是带有 -Ofast -mfma 的 Clang 3.7 产生与带有 -O3 fast 的 GCC 相同的代码.
but Clang 3.7 with -Ofast -mfma produces the same code as GCC with -O3 fast.
我很惊讶 GCC 可以使用 -O3 因为从 这个答案它说
I am surprised that GCC does with -O3 because from this answer it says
除非您允许使用宽松的浮点模型,否则编译器不允许融合单独的加法和乘法.
The compiler is not allowed to fuse a separated add and multiply unless you allow for a relaxed floating-point model.
这是因为 FMA 只有一个舍入,而 ADD + MUL 有两个.因此编译器将通过融合违反严格的 IEEE 浮点行为.
This is because an FMA has only one rounding, while an ADD + MUL has two. So the compiler will violate strict IEEE floating-point behaviour by fusing.
但是,从这个链接它说
无论 FLT_EVAL_METHOD 的值如何,任何浮点表达式都可以被压缩,也就是说,就像所有中间结果都具有无限范围和精度一样进行计算.
Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision.
所以现在我很困惑和担心.
So now I am confused and concerned.
由于 FMA 可以在软件中模拟,似乎应该有是 FMA 的两个编译器开关:一个告诉编译器在计算中使用 FMA,一个告诉编译器硬件有 FMA.
Since FMA can be emulated in software it seems to be there should be two compiler switches for FMA: one to tell the compiler to use FMA in calculations and one to tell the compiler that the hardware has FMA.
显然,这可以通过选项 -ffp-contract 来控制.GCC 的默认值为 -ffp-contract=fast 而 Clang 则不是.-ffp-contract=on 和 -ffp-contract=off 等其他选项不会产生 FMA 指令.
Apprently this can be controlled with the option -ffp-contract. With GCC the default is -ffp-contract=fast and with Clang it's not. Other options such as -ffp-contract=on and -ffp-contract=off do no produce the FMA instruction.
例如带有 -O3 -mfma -ffp-contract=fast 的 Clang 3.7 生成 vfmadd132ss.
For example Clang 3.7 with -O3 -mfma -ffp-contract=fast produces vfmadd132ss.
我使用 -ffp-contract 检查了 #pragma STDC FP_CONTRACT 设置为 ON 和 OFF 的一些排列设置为 on、off 和 fast.在所有情况下,我还使用了 -O3 -mfma.
I checked some permutations of #pragma STDC FP_CONTRACT set to ON and OFF with -ffp-contract set to on, off, and fast. IN all cases I also used -O3 -mfma.
对于 GCC,答案很简单.#pragma STDC FP_CONTRACT ON 或 OFF 没有区别.只有 -ffp-contract 重要.
With GCC the answer is simple. #pragma STDC FP_CONTRACT ON or OFF makes no difference. Only -ffp-contract matters.
GCC 它使用 fma 和
在 Clang 中,它使用 fma
With Clang it uses fma
换句话说,使用 Clang,您可以使用 #pragma STDC FP_CONTRACT ON 获得 fma(因为 -ffp-contract=on 是默认的) 或使用 -ffp-contract=fast.-ffast-math(因此-Ofast)设置-ffp-contract=fast.
In other words with Clang you can get fma with #pragma STDC FP_CONTRACT ON (since -ffp-contract=on is the default) or with -ffp-contract=fast. -ffast-math (and hence -Ofast) set -ffp-contract=fast.
我研究了 MSVC 和 ICC.
I looked into MSVC and ICC.
在 MSVC 中,它使用带有 /O2/arch:AVX2/fp:fast 的 fma 指令.使用 MSVC /fp:precise 是默认值.
With MSVC it uses the fma instruction with /O2 /arch:AVX2 /fp:fast. With MSVC /fp:precise is the default.
对于 ICC,它使用 fma 和 -O3 -march=core-avx2(实际上 -O1 就足够了).这是因为默认情况下 ICC 使用 -fp-model fast.但是 ICC 使用 fma 甚至 -fp-model precision.要使用 ICC 禁用 fma,请使用 -fp-model strict 或 -no-fma.
With ICC it uses fma with -O3 -march=core-avx2 (acctually -O1 is sufficient). This is because by default ICC uses -fp-model fast. But ICC uses fma even with -fp-model precise. To disable fma with ICC use -fp-model strict or -no-fma.
所以默认情况下 GCC 和 ICC 在启用 fma 时使用 fma(使用 -mfma 用于 GCC/Clang 或 -march=core-avx2 使用 ICC)但 Clang 和MSVC 没有.
So by default GCC and ICC use fma when fma is enabled (with -mfma for GCC/Clang or -march=core-avx2 with ICC) but Clang and MSVC do not.
推荐答案它不违反 IEEE-754,因为 IEEE-754 在这一点上遵循语言:
It doesn't violate IEEE-754, because IEEE-754 defers to languages on this point:
语言标准还应定义并要求实现提供允许和禁止对块单独或共同进行值更改优化的属性.这些优化可能包括但不限于:
A language standard should also define, and require implementations to provide, attributes that allow and disallow value-changing optimizations, separately or collectively, for a block. These optimizations might include, but are not limited to:
...
― 由乘法和加法合成 fusedMultiplyAdd 操作.
― Synthesis of a fusedMultiplyAdd operation from a multiplication and an addition.
在标准 C 中,STDC FP_CONTRACT 编译指示提供了控制此值更改优化的方法.所以默认情况下,GCC 被授权执行融合,只要它允许您通过设置 STDC FP_CONTRACT OFF 来禁用优化.不支持就意味着不遵守 C 标准.
In standard C, the STDC FP_CONTRACT pragma provides the means to control this value-changing optimization. So GCC is licensed to perform the fusion by default, so long as it allows you to disable the optimization by setting STDC FP_CONTRACT OFF. Not supporting that means not adhering to the C standard.
更多推荐
融合乘加和默认舍入模式
发布评论