这里是一些代码(完整的程序后面的问题)
template< typename T& T fizzbuzz(T n){ T count(0); #if CONST const T div(3); #else T div(3); #endif for(T i(0); i <= n; ++ i){ if(i%div == T(0))count + = i; } return count; }$ b $ p
现在,如果我使用 int ,那么根据是否定义CONST,我得到了6个性能差异的因子:
$ gcc --version gcc(GCC)3.4.4(cygming special,gdc 0.12,using dmd 0.125) $ make -B wrappedint CPPFLAGS = - O3 -Wall -Werror - DWRAP = 0 -DCONST = 0&& time ./wrappedint g ++ -O3 -Wall -Werror -DWRAP = 0 -DCONST = 0 wrappedint.cpp -o wrappedi nt 484573652 real 0m2.543s 用户0m2.059s sys 0m0.046s $ make -B wrappedint CPPFLAGS = - O3 -Wall -Werror -DWRAP = 0 -DCONST = 1&& time ./wrappedint g ++ -O3 -Wall -Werror -DWRAP = 0 -DCONST = 1 wrappedint.cpp -o wrappedi nt 484573652 real 0m0.655s 用户0m0.327s sys 0m0.046s检查反汇编显示,在fast(const)情况下,模数已经变成乘法和shift类型的东西,而在慢(非const)情况下,它使用 idivl 。
更糟糕的是,如果我试图将我的整数包装在一个类中,那么优化不会发生,无论我使用const还是不。代码总是使用 idivl 并运行缓慢:
#include< iostream> struct WrappedInt { int v; explicit WrappedInt(const int& val):v(val){} bool operator< =(const WrappedInt& rhs)const {return v <= rhs.v; } bool operator ==(const WrappedInt& rhs)const {return v == rhs.v; } WrappedInt& operator ++(){++ v; return * this; } WrappedInt& operator + =(const WrappedInt& rhs){v + = rhs.v; return * this; } WrappedInt operator%(const WrappedInt& rhs)const {return WrappedInt(v%rhs.v); } }; std :: ostream& operator<<<(std :: ostream& s,WrappedInt w){ return s< ; } template< typename T> T fizzbuzz(T n){ T count(0); #if CONST const T div(3); #else T div(3); #endif for(T i(0); i <= n; ++ i){ if(i%div == T(0))count + = i; } return count; } int main(){ #if WRAP WrappedInt w(123456789); std :: cout<< fizzbuzz(w)< \\\; #else std :: cout<< fizzbuzz< int>(123456789)< \\\; #endif }我的问题是:
1)有一个C ++本身的简单原理,或gcc的优化,这解释了为什么会发生这种情况,还是只是一个各种启发式运行,这是你得到的代码的情况?
2)有没有办法使编译器意识到我的本地声明和从不引用的const WrappedInt可以被当作编译时const值?我想这个东西是一个直接替换int的模板。
3)有一种已知的方式来包装一个int,使编译器可以丢弃包装时优化?目标是WrappedInt将是一个基于策略的模板。但是如果一个do-nothing策略导致基本上任意的6倍速度惩罚超过int,我更好的特殊情况下,使用int直接。
我猜它只是刚刚运行的严重老的GCC版本。我在我的机器上的最古老的编译器 - gcc-4.1.2,与非const和wrap版本一起执行快速方法(并且只在-O1处执行)。
Here's some code (full program follows later in the question):
template <typename T> T fizzbuzz(T n) { T count(0); #if CONST const T div(3); #else T div(3); #endif for (T i(0); i <= n; ++i) { if (i % div == T(0)) count += i; } return count; }Now, if I call this template function with int, then I get a factor of 6 performance difference according to whether I define CONST or not:
$ gcc --version gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125) $ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=0" && time ./wrappedint g++ -O3 -Wall -Werror -DWRAP=0 -DCONST=0 wrappedint.cpp -o wrappedi nt 484573652 real 0m2.543s user 0m2.059s sys 0m0.046s $ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=1" && time ./wrappedint g++ -O3 -Wall -Werror -DWRAP=0 -DCONST=1 wrappedint.cpp -o wrappedi nt 484573652 real 0m0.655s user 0m0.327s sys 0m0.046sExamining the disassembly shows that in the fast (const) case, the modulo has been turned into a multiplication and shift type thing, whereas in the slow (non-const) case it's using idivl.
Even worse, if I try to wrap my integer in a class, then the optimisation doesn't happen whether I use const or not. The code always uses idivl and runs slow:
#include <iostream> struct WrappedInt { int v; explicit WrappedInt(const int &val) : v(val) {} bool operator<=(const WrappedInt &rhs) const { return v <= rhs.v; } bool operator==(const WrappedInt &rhs) const { return v == rhs.v; } WrappedInt &operator++() { ++v; return *this; } WrappedInt &operator+=(const WrappedInt &rhs) { v += rhs.v; return *this; } WrappedInt operator%(const WrappedInt &rhs) const { return WrappedInt(v%rhs.v); } }; std::ostream &operator<<(std::ostream &s, WrappedInt w) { return s << w.v; } template <typename T> T fizzbuzz(T n) { T count(0); #if CONST const T div(3); #else T div(3); #endif for (T i(0); i <= n; ++i) { if (i % div == T(0)) count += i; } return count; } int main() { #if WRAP WrappedInt w(123456789); std::cout << fizzbuzz(w) << "\n"; #else std::cout << fizzbuzz<int>(123456789) << "\n"; #endif }My questions are:
1) Is there a simple principle of C++ itself, or gcc's optimisation, which explains why this happens, or is it just a case of "various heuristics run, this is the code you get"?
2) Is there any way to make the compiler realise that my locally-declared and never-referenced const WrappedInt can be treated as a compile-time const value? I want this thing to be a straight replacement for int in templates.
3) Is there a known way of wrapping an int such that the compiler can discard the wrapping when optimising? The goal is that WrappedInt will be a policy-based template. But if a "do-nothing" policy results in essentially arbitrary 6x speed penalties over int, I'm better off special-casing that situation and using int directly.
解决方案I'm guessing its just the severely old GCC version you are running. The oldest compiler I have on my machine - gcc-4.1.2, performs the fast way with both the non-const and the wrap versions (and does so at only -O1).
更多推荐
gcc中的分区优化
发布评论