admin管理员组文章数量:1590572
如何确认你的代码被编译期SIMD向量化?从上面的视频学到了基本操作方法。
一般例子
举个例子。代码如下:
1 // Copyright 1999-2023 Alibaba Inc. All Rights Reserved.
2 // Author:
3 // xiaochu.yh@alipay
4 //
5
6 #include <stdio.h>
7
8 int main(int argc, const char *argv[])
9 {
10 int j = 0;
11 for (int i = 0; i < 1024; ++i) {
12 j += i;
13 }
14 printf("j = %d\n", j);
15 return 0;
16 }
带上所有参数做编译:
[xiaochu.yh ~/tools/vector] $g++ vector.cpp -fopt-info-vec-optimized -fopt-info-vec-missed -fopt-info-vec-note -fopt-info-vec-all
[xiaochu.yh ~/tools/vector] $./a.out
j = 523776
没有任何输出?原来缺了 -O3 优化。加上 -O3 后,再次编译,输出如下。
[xiaochu.yh ~/tools/vector] $g++ vector.cpp -O3 -fopt-info-vec-optimized -fopt-info-vec-missed -fopt-info-vec-note -fopt-info-vec-all
Analyzing loop at vector.cpp:11
vector.cpp:11: note: ===== analyze_loop_nest =====
vector.cpp:11: note: === vect_analyze_loop_form ===
vector.cpp:11: note: === get_loop_niters ===
vector.cpp:11: note: ==> get_loop_niters:1024
vector.cpp:11: note: === vect_analyze_data_refs ===
vector.cpp:11: note: === vect_analyze_scalar_cycles ===
vector.cpp:11: note: Analyze phi: j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: Access function of PHI: {0, +, i_11}_1
vector.cpp:11: note: step: i_11, init: 0
vector.cpp:11: note: step unknown.
vector.cpp:11: note: Analyze phi: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: Access function of PHI: {0, +, 1}_1
vector.cpp:11: note: step: 1, init: 0
vector.cpp:11: note: Detected induction.
vector.cpp:11: note: Analyze phi: ivtmp_2 = PHI <ivtmp_1(4), 1024(2)>
vector.cpp:11: note: Access function of PHI: {1024, +, 4294967295}_1
vector.cpp:11: note: step: 4294967295, init: 1024
vector.cpp:11: note: Detected induction.
vector.cpp:11: note: Analyze phi: j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: detected reduction: need to swap operands: j_3 = j_10 + i_11;
vector.cpp:11: note: Detected reduction.
vector.cpp:11: note: === vect_pattern_recog ===
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: === vect_mark_stmts_to_be_vectorized ===
vector.cpp:11: note: init: phi relevant? j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: init: phi relevant? i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: init: phi relevant? ivtmp_2 = PHI <ivtmp_1(4), 1024(2)>
vector.cpp:11: note: init: stmt relevant? j_3 = i_11 + j_10;
vector.cpp:11: note: vec_stmt_relevant_p: used out of loop.
vector.cpp:11: note: mark relevant 0, live 1.
vector.cpp:11: note: init: stmt relevant? i_4 = i_11 + 1;
vector.cpp:11: note: init: stmt relevant? ivtmp_1 = ivtmp_2 - 1;
vector.cpp:11: note: init: stmt relevant? if (ivtmp_1 != 0)
vector.cpp:11: note: worklist: examine stmt: j_3 = i_11 + j_10;
vector.cpp:11: note: vect_is_simple_use: operand j_10
vector.cpp:11: note: def_stmt: j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: type of def: 5.
vector.cpp:11: note: mark relevant 3, live 0.
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: mark relevant 3, live 0.
vector.cpp:11: note: worklist: examine stmt: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: vect_is_simple_use: operand i_4
vector.cpp:11: note: def_stmt: i_4 = i_11 + 1;
vector.cpp:11: note: type of def: 3.
vector.cpp:11: note: mark relevant 3, live 0.
vector.cpp:11: note: vect_is_simple_use: operand 0
vector.cpp:11: note: worklist: examine stmt: i_4 = i_11 + 1;
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: mark relevant 3, live 0.
vector.cpp:11: note: already marked relevant/live.
vector.cpp:11: note: worklist: examine stmt: j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: vect_is_simple_use: operand j_3
vector.cpp:11: note: def_stmt: j_3 = i_11 + j_10;
vector.cpp:11: note: type of def: 5.
vector.cpp:11: note: reduc-stmt defining reduc-phi in the same nest.
vector.cpp:11: note: vect_is_simple_use: operand 0
vector.cpp:11: note: === vect_analyze_dependences ===
vector.cpp:11: note: === vect_determine_vectorization_factor ===
vector.cpp:11: note: ==> examining phi: j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: get vectype for scalar type: int
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: nunits = 4
vector.cpp:11: note: ==> examining phi: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: get vectype for scalar type: int
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: nunits = 4
vector.cpp:11: note: ==> examining phi: ivtmp_2 = PHI <ivtmp_1(4), 1024(2)>
vector.cpp:11: note: ==> examining statement: j_3 = i_11 + j_10;
vector.cpp:11: note: get vectype for scalar type: int
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: get vectype for scalar type: int
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: nunits = 4
vector.cpp:11: note: ==> examining statement: i_4 = i_11 + 1;
vector.cpp:11: note: get vectype for scalar type: int
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: get vectype for scalar type: int
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: nunits = 4
vector.cpp:11: note: ==> examining statement: ivtmp_1 = ivtmp_2 - 1;
vector.cpp:11: note: skip.
vector.cpp:11: note: ==> examining statement: if (ivtmp_1 != 0)
vector.cpp:11: note: skip.
vector.cpp:11: note: vectorization factor = 4
vector.cpp:11: note: === vect_analyze_data_refs_alignment ===
vector.cpp:11: note: === vect_analyze_data_ref_accesses ===
vector.cpp:11: note: === vect_prune_runtime_alias_test_list ===
vector.cpp:11: note: === vect_enhance_data_refs_alignment ===
vector.cpp:11: note: vect_can_advance_ivs_p:
vector.cpp:11: note: Analyze phi: j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: reduc phi. skip.
vector.cpp:11: note: Analyze phi: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: Access function of PHI: {0, +, 1}_1
vector.cpp:11: note: Analyze phi: ivtmp_2 = PHI <ivtmp_1(4), 1024(2)>
vector.cpp:11: note: Access function of PHI: {1024, +, 4294967295}_1
vector.cpp:11: note: === vect_analyze_slp ===
vector.cpp:11: note: === vect_make_slp_decision ===
vector.cpp:11: note: === vect_detect_hybrid_slp ===
vector.cpp:11: note: === vect_analyze_loop_operations ===
vector.cpp:11: note: examining phi: j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: examining phi: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: === vectorizable_induction ===
vector.cpp:11: note: vect_model_induction_cost: inside_cost = 1, prologue_cost = 2 .
vector.cpp:11: note: examining phi: ivtmp_2 = PHI <ivtmp_1(4), 1024(2)>
vector.cpp:11: note: ==> examining statement: j_3 = i_11 + j_10;
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: vect_is_simple_use: operand j_10
vector.cpp:11: note: def_stmt: j_10 = PHI <j_3(4), 0(2)>
vector.cpp:11: note: type of def: 5.
vector.cpp:11: note: detected reduction: j_3 = i_11 + j_10;
vector.cpp:11: note: reduc op not supported by target.
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) intvect_model_reduction_cost: inside_cost = 1, prologue_cost = 1, epilogue_cost = 5 .
vector.cpp:11: note: ==> examining statement: i_4 = i_11 + 1;
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(2)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: vect_is_simple_use: operand 1
vector.cpp:11: note: === vectorizable_operation ===
vector.cpp:11: note: vect_model_simple_cost: inside_cost = 1, prologue_cost = 1 .
vector.cpp:11: note: ==> examining statement: ivtmp_1 = ivtmp_2 - 1;
vector.cpp:11: note: irrelevant.
vector.cpp:11: note: ==> examining statement: if (ivtmp_1 != 0)
vector.cpp:11: note: irrelevant.
vector.cpp:11: note: vectorization_factor = 4, niters = 1024
vector.cpp:11: note: === vect_update_slp_costs_according_to_vf ===
vector.cpp:11: note: Cost model analysis:
Vector inside of loop cost: 3
Vector prologue cost: 4
Vector epilogue cost: 5
Scalar iteration cost: 2
Scalar outside cost: 0
Vector outside cost: 9
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 8
vector.cpp:11: note: Runtime profitability threshold = 7
vector.cpp:11: note: Static estimate profitability threshold = 7
Vectorizing loop at vector.cpp:11
vector.cpp:11: note: === vec_transform_loop ===
vector.cpp:11: note: ------>vectorizing phi: j_10 = PHI <j_3(4), 0(6)>
vector.cpp:11: note: ------>vectorizing phi: i_11 = PHI <i_4(4), 0(6)>
vector.cpp:11: note: transform phi.
vector.cpp:11: note: transform induction phi.
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: created new init_stmt: stmp_var_.3_9 = 0 + 1;
vector.cpp:11: note: created new init_stmt: stmp_var_.3_8 = stmp_var_.3_9 + 1;
vector.cpp:11: note: created new init_stmt: stmp_var_.3_12 = stmp_var_.3_8 + 1;
vector.cpp:11: note: created new init_stmt: vect_cst_.4_13 = {0, stmp_var_.3_9, stmp_var_.3_8, stmp_var_.3_12};
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: created new init_stmt: vect_cst_.5_14 = { 4, 4, 4, 4 };
vector.cpp:11: note: transform induction: created def-use cycle: vect_vec_iv_.6_15 = PHI <vect_vec_iv_.6_16(4), vect_cst_.4_13(6)>
vect_vec_iv_.6_16 = vect_vec_iv_.6_15 + vect_cst_.5_14;
vector.cpp:11: note: ------>vectorizing phi: ivtmp_2 = PHI <ivtmp_1(4), 1024(6)>
vector.cpp:11: note: ------>vectorizing phi: vect_vec_iv_.6_15 = PHI <vect_vec_iv_.6_16(4), vect_cst_.4_13(6)>
vector.cpp:11: note: ------>vectorizing statement: vect_vec_iv_.6_16 = vect_vec_iv_.6_15 + vect_cst_.5_14;
vector.cpp:11: note: ------>vectorizing statement: j_3 = i_11 + j_10;
vector.cpp:11: note: transform statement.
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(6)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: vect_is_simple_use: operand j_10
vector.cpp:11: note: def_stmt: j_10 = PHI <j_3(4), 0(6)>
vector.cpp:11: note: type of def: 5.
vector.cpp:11: note: detected reduction: j_3 = i_11 + j_10;
vector.cpp:11: note: reduc op not supported by target.
vector.cpp:11: note: transform reduction.
vector.cpp:11: note: vect_get_vec_def_for_operand: i_11
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(6)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: def = i_11 def_stmt = i_11 = PHI <i_4(4), 0(6)>
vector.cpp:11: note: add new stmt: vect_j.7_18 = vect_vec_iv_.6_15 + vect_j.7_17;
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: vect_get_vec_def_for_operand: j_10
vector.cpp:11: note: vect_is_simple_use: operand j_10
vector.cpp:11: note: def_stmt: j_10 = PHI <j_3(4), 0(6)>
vector.cpp:11: note: type of def: 5.
vector.cpp:11: note: def = j_10 def_stmt = j_10 = PHI <j_3(4), 0(6)>
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: transform reduction: created def-use cycle: vect_j.7_17 = PHI <vect_j.7_18(4), { 0, 0, 0, 0 }(6)>
vect_j.7_18 = vect_vec_iv_.6_15 + vect_j.7_17;
vector.cpp:11: note: Reduce using vector shifts
vector.cpp:11: note: extract scalar result
vector.cpp:11: note: ------>vectorizing statement: i_4 = i_11 + 1;
vector.cpp:11: note: transform statement.
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(6)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: vect_is_simple_use: operand 1
vector.cpp:11: note: transform binary/unary operation.
vector.cpp:11: note: vect_get_vec_def_for_operand: i_11
vector.cpp:11: note: vect_is_simple_use: operand i_11
vector.cpp:11: note: def_stmt: i_11 = PHI <i_4(4), 0(6)>
vector.cpp:11: note: type of def: 4.
vector.cpp:11: note: def = i_11 def_stmt = i_11 = PHI <i_4(4), 0(6)>
vector.cpp:11: note: vect_get_vec_def_for_operand: 1
vector.cpp:11: note: vect_is_simple_use: operand 1
vector.cpp:11: note: get vectype with 4 units of type int
vector.cpp:11: note: vectype: vector(4) int
vector.cpp:11: note: Create vector_cst. nunits = 4
vector.cpp:11: note: created new init_stmt: vect_cst_.12_26 = { 1, 1, 1, 1 };
vector.cpp:11: note: add new stmt: vect_i.11_27 = vect_vec_iv_.6_15 + vect_cst_.12_26;
vector.cpp:11: note: ------>vectorizing statement: ivtmp_1 = ivtmp_2 - 1;
vector.cpp:11: note: ------>vectorizing statement: if (ivtmp_1 != 0)
loop at vector.cpp:12: if (ivtmp_29 < 256)
vector.cpp:11: note: LOOP VECTORIZED.
vector.cpp:8: note: vectorized 1 loops in function.
vector.cpp:8: note: ===vect_slp_analyze_bb===
vector.cpp:8: note: === vect_analyze_data_refs ===
vector.cpp:8: note: not vectorized: not enough data-refs in basic block.
vector.cpp:12: note: ===vect_slp_analyze_bb===
vector.cpp:12: note: === vect_analyze_data_refs ===
vector.cpp:12: note: not vectorized: not enough data-refs in basic block.
vector.cpp:8: note: ===vect_slp_analyze_bb===
vector.cpp:8: note: === vect_analyze_data_refs ===
vector.cpp:8: note: not vectorized: not enough data-refs in basic block.
vector.cpp:14: note: ===vect_slp_analyze_bb===
vector.cpp:14: note: === vect_analyze_data_refs ===
vector.cpp:14: note: not vectorized: not enough data-refs in basic block.
内容太多了,精简参数再次编译:
[xiaochu.yh ~/tools/vector] $g++ vector.cpp -O3 -fopt-info-vec-optimized -fopt-info-vec-missed
Analyzing loop at vector.cpp:11
vector.cpp:11: note: step unknown.
vector.cpp:11: note: reduc phi. skip.
vector.cpp:11: note: reduc op not supported by target.
Vectorizing loop at vector.cpp:11
vector.cpp:11: note: reduc op not supported by target.
vector.cpp:11: note: LOOP VECTORIZED.
vector.cpp:8: note: vectorized 1 loops in function.
vector.cpp:8: note: not vectorized: not enough data-refs in basic block.
vector.cpp:12: note: not vectorized: not enough data-refs in basic block.
vector.cpp:8: note: not vectorized: not enough data-refs in basic block.
vector.cpp:14: note: not vectorized: not enough data-refs in basic block.
继续精简:
[xiaochu.yh ~/tools/vector] $g++ vector.cpp -O3 -fopt-info-vec-optimized
Analyzing loop at vector.cpp:11
Vectorizing loop at vector.cpp:11
vector.cpp:11: note: LOOP VECTORIZED.
vector.cpp:8: note: vectorized 1 loops in function.
再次贴一下代码,11 行被 vectorized,不知道具体是做了什么呢?
1 // Copyright 1999-2023 Alibaba Inc. All Rights Reserved.
2 // Author:
3 // xiaochu.yh@alipay
4 //
5
6 #include <stdio.h>
7
8 int main(int argc, const char *argv[])
9 {
10 int j = 0;
11 for (int i = 0; i < 1024; ++i) {
12 j += i;
13 }
14 printf("j = %d\n", j);
15 return 0;
16 }
典型例子
下面例子中,三个循环都被 SIMD 了。如果把循环次数改成 3,则不会做 SIMD。所以,编译期后端会有一个类似 SQL 优化器的东西来计算优化前后代价,选择代价更小的作为最终输出。
[xiaochu.yh ~/tools/vector] $g++ vector.cpp -O3 -fopt-info-vec-optimized
Analyzing loop at vector.cpp:13
Vectorizing loop at vector.cpp:13
vector.cpp:13: note: LOOP VECTORIZED.
Analyzing loop at vector.cpp:10
Vectorizing loop at vector.cpp:10
vector.cpp:10: note: LOOP VECTORIZED.
Analyzing loop at vector.cpp:7
Vectorizing loop at vector.cpp:7
vector.cpp:7: note: LOOP VECTORIZED.
vector.cpp:3: note: vectorized 3 loops in function.
[xiaochu.yh ~/tools/vector] $cat vector.cpp | nl
1 #include <stdio.h>
2 int main(int argc, const char *argv[])
3 {
4 int result[32];
5 int j = 0;
6 for (int i = 0; i < 32; ++i) {
7 result[i] = i;
8 }
9 for (int i = 0; i < 32; ++i) {
10 result[i] = i + result[i];
11 }
12 for (int i = 0; i < 32; ++i) {
13 j += result[i];
14 }
15 printf("%d\n", j);
16 return 0;
17 }
```
## 优化级别
测试中发现,必须用 -O3 才会开启自动向量化。
![在这里插入图片描述](https://img-blog.csdnimg/9abd99612880463697c9ca589dc3e2b0.png)
CHATGPT: 在使用 GCC 编译 C++ 代码时开启自动向量化,可以通过添加 -ftree-vectorize 编译选项实现。该选项会启用树形优化器,自动将代码向量化以提高执行效率。此外,你还可以添加 -O3 选项以启用更高级别的优化。完整的编译命令如下所示:
版权声明:本文标题:如何确认你的代码被编译期SIMD向量化 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:https://www.elefans.com/dianzi/1725588957a1031508.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论