从C编译器了解MIPS汇编代码(Understanding MIPS assembly code from C compiler)

我将C代码转换为MIPS，我无法理解MIPS指令的一部分：

#include <inttypes.h> #include <stdint.h> uint16_t chksum(uint16_t sum, const uint8_t *data, uint16_t len) { uint16_t t; const uint8_t *dataptr; const uint8_t *last_byte; dataptr = data; last_byte = data + len - 1; while (dataptr < last_byte) { t = (dataptr[0] << 8) + dataptr[1]; sum += t; if (sum < t) { sum++; } dataptr += 2; } if (dataptr == last_byte) { t = (dataptr[0] << 8) + 0; sum += t; if (sum < t) { sum++; } } return sum; }

我在Godbolt编译器资源管理器上使用MIPS gcc5.4进行 -O2优化，默认情况下经典的MIPS1没有负载互锁：

chksum(unsigned short, unsigned char const*, unsigned short): andi $6,$6,0xffff addiu $6,$6,-1 addu $6,$5,$6 sltu $3,$5,$6 beq $3,$0,$L2 andi $2,$4,0xffff move $4,$5 $L4: lbu $3,0($4) lbu $7,1($4) sll $3,$3,8 addu $3,$3,$7 andi $3,$3,0xffff addu $2,$3,$2 andi $2,$2,0xffff addiu $4,$4,2 sltu $3,$2,$3 sltu $7,$4,$6 beq $3,$0,$L3 addiu $8,$2,1 andi $2,$8,0xffff $L3: bne $7,$0,$L4 nor $3,$0,$5 addu $3,$3,$6 srl $3,$3,1 addiu $3,$3,1 sll $3,$3,1 addu $5,$5,$3 $L2: beq $6,$5,$L8 nop $L9: j $31 nop $L8: lbu $3,0($6) nop sll $3,$3,8 addu $2,$3,$2 andi $2,$2,0xffff sltu $3,$2,$3 beq $3,$0,$L9 nop addiu $2,$2,1 j $31 andi $2,$2,0xffff

我将大部分指令与代码进行了匹配，但我无法理解$L3的部分，直到addu在$L2之前，才开始执行指令。

编译器资源管理器显示该部分与时间有关while但我不明白为什么它在$L2的分支之前操纵$5 。

I converted a C code to MIPS and I could not understand a part of the MIPS instructions:

#include <inttypes.h> #include <stdint.h> uint16_t chksum(uint16_t sum, const uint8_t *data, uint16_t len) { uint16_t t; const uint8_t *dataptr; const uint8_t *last_byte; dataptr = data; last_byte = data + len - 1; while (dataptr < last_byte) { t = (dataptr[0] << 8) + dataptr[1]; sum += t; if (sum < t) { sum++; } dataptr += 2; } if (dataptr == last_byte) { t = (dataptr[0] << 8) + 0; sum += t; if (sum < t) { sum++; } } return sum; }

I used MIPS gcc5.4 on the Godbolt compiler explorer with -O2 optimization, with the default -march of classic MIPS1 that doesn't have load interlocks:

chksum(unsigned short, unsigned char const*, unsigned short): andi $6,$6,0xffff addiu $6,$6,-1 addu $6,$5,$6 sltu $3,$5,$6 beq $3,$0,$L2 andi $2,$4,0xffff move $4,$5 $L4: lbu $3,0($4) lbu $7,1($4) sll $3,$3,8 addu $3,$3,$7 andi $3,$3,0xffff addu $2,$3,$2 andi $2,$2,0xffff addiu $4,$4,2 sltu $3,$2,$3 sltu $7,$4,$6 beq $3,$0,$L3 addiu $8,$2,1 andi $2,$8,0xffff $L3: bne $7,$0,$L4 nor $3,$0,$5 addu $3,$3,$6 srl $3,$3,1 addiu $3,$3,1 sll $3,$3,1 addu $5,$5,$3 $L2: beq $6,$5,$L8 nop $L9: j $31 nop $L8: lbu $3,0($6) nop sll $3,$3,8 addu $2,$3,$2 andi $2,$2,0xffff sltu $3,$2,$3 beq $3,$0,$L9 nop addiu $2,$2,1 j $31 andi $2,$2,0xffff

I matched most of the instructions with the code but I could not understand the part in $L3 starting with nor instruction until the addu before $L2.

The Compiler Explorer shows that part is related to the while but I do not get why it manipulates the $5 before the branch in $L2.

最满意答案

我们来分析一下代码在做什么。几个映射使代码易于遵循：

Initial parameters: $4: sum parameter $5: data parameter $6: len parameter Labels: $L4: while body $L3: while condition $L2: if condition Registers: $2: sum $4: dataptr $6: last_byte

相关代码：

// [...] sltu $3,$5,$6 // $3 = $5 (data parameter) < $6 (last_byte) ? 1 : 0 beq $3,$0,$L2 // if $3 == 0 goto $L2 (if condition) andi $2,$4,0xffff // $2 (sum) = $4 (sum parameter) & 0xFFFF move $4,$5 // $4 (dataptr) = $5 (data parameter) $L4: // while body // [...] sltu $7,$4,$6 // $7 = $4 (dataptr) < $6 (last_byte) ? 1 : 0 // [...] $L3: // while condition bne $7,$0,$L4 // if $7 != 0 goto $L4 (while body) [1] nor $3,$0,$5 // $3 = $5 (data) nor 0 addu $3,$3,$6 // $3 += $6 (last_byte) srl $3,$3,1 // $3 >>= 1 addiu $3,$3,1 // $3++ sll $3,$3,1 // $3 <<= 1 addu $5,$5,$3 // $5 += $3 $L2: // if condition beq $6,$5,$L8 // if $6 (last_byte) == $5 goto $L8 [2]

while循环在[1]处结束。直到[2]的其余指令计算一个值到寄存器$5 ，与$6 （ last_byte ）进行比较，这是源代码中的if 。

这里的问题是： $5的价值是多少？如果你把所有的操作放在一起，你会得到：

$5 = $5 + ((((($5 nor 0) + $6) >> 1) + 1) << 1)

让我们解开这个表达式。首先认识到：

x NOR 0 = NOT(x OR 0) = ~(x | 0) = ~x

所以这只是否定（补充） $5 。

然后，它增加$6 ，这是last_byte 。

接下来的3个操作（ >> 1 ， + 1 ， << 1 ）是计算下一个偶数整数的一种方法。看几个案例会发生什么：

0000 (0) -> 0010 (2) 0001 (1) -> 0010 (2) 0010 (2) -> 0100 (4) 0011 (3) -> 0100 (4) 0100 (4) -> 0110 (6) 0101 (5) -> 0110 (6) 0110 (6) -> 1000 (8) 0111 (7) -> 1000 (8)

最后，它将添加$5的原始值，这是data参数。

如果您将所有内容放在一起，并且为了清楚起见用C变量的名称替换，则可以到达：

$5 = data + next_even(~data + last_byte)

回想一下，对于二进制补码整数：

x - y == x + ~y + 1

因此：

$5 = data + next_even(last_byte - data - 1) = data + next_even(len - 2)

现在，在减去2之后计算下一个偶数值基本上是除去最低位的信息; 换句话说，是偶数的“底”。这可以表示为返回相同的数字，如果它是偶数，或者少一个，如果它是奇数，即：

$5 = data + (len % 2 ? len : len - 1)

最后，编译器将该寄存器与$6 （ last_byte ）进行比较。简化：

last_byte == data + (len % 2 ? len : len - 1) data + len - 1 == data + (len % 2 ? len : len - 1) len - 1 == len % 2 ? len : len - 1 len % 2 != 0

现在我们还可以看到，表达式实际上只取决于len ，而不取决于data 。

编译器和所有这些指令都有效地从data和last_bytes重新计算data last_bytes 。事实上，如果您认为该data仅从data中以2为增量进行升级，我们可以将其重写为：

data + 2 * n_increments data + 2 * (len / 2) data + (len % 2 ? len : len - 1)

这正是上面计算的$5的价值。

知道这一点，人们可能会奇怪为什么编译器能够达到这个解决方案。最近版本的GCC（8.1.0）和x86-64也是如此：

mov rdx, rsi not rdx add rdx, r8 shr rdx lea rsi, [rsi+2+rdx*2]

很明显，优化器意识到dataptr的最终值可以独立于while循环计算 - 但是，它dataptr为什么决定这样做，而不是从寄存器中选择值。也许它已经决定避免对循环结果的依赖性比其他方式更快（由于指令流水线）。

Let's analyze what the code is doing. A few mappings to make the code easy to follow:

Initial parameters: $4: sum parameter $5: data parameter $6: len parameter Labels: $L4: while body $L3: while condition $L2: if condition Registers: $2: sum $4: dataptr $6: last_byte

The relevant code:

// [...] sltu $3,$5,$6 // $3 = $5 (data parameter) < $6 (last_byte) ? 1 : 0 beq $3,$0,$L2 // if $3 == 0 goto $L2 (if condition) andi $2,$4,0xffff // $2 (sum) = $4 (sum parameter) & 0xFFFF move $4,$5 // $4 (dataptr) = $5 (data parameter) $L4: // while body // [...] sltu $7,$4,$6 // $7 = $4 (dataptr) < $6 (last_byte) ? 1 : 0 // [...] $L3: // while condition bne $7,$0,$L4 // if $7 != 0 goto $L4 (while body) [1] nor $3,$0,$5 // $3 = $5 (data) nor 0 addu $3,$3,$6 // $3 += $6 (last_byte) srl $3,$3,1 // $3 >>= 1 addiu $3,$3,1 // $3++ sll $3,$3,1 // $3 <<= 1 addu $5,$5,$3 // $5 += $3 $L2: // if condition beq $6,$5,$L8 // if $6 (last_byte) == $5 goto $L8 [2]

The while loop finishes at [1]. The rest of the instructions until [2] compute a value into register $5 for comparison with $6 (last_byte), which is the if in the source code.

The question here is: what is the value in $5? If you put together all the operations, you get:

$5 = $5 + ((((($5 nor 0) + $6) >> 1) + 1) << 1)

Let's unravel this expression. First, realize that:

x NOR 0 = NOT(x OR 0) = ~(x | 0) = ~x

So it is just negating (one's complement) on $5.

Then, it adds $6, which is last_byte.

The next 3 operations (>> 1, + 1, << 1) are a way to compute the next even integer. See what happens with a few cases:

0000 (0) -> 0010 (2) 0001 (1) -> 0010 (2) 0010 (2) -> 0100 (4) 0011 (3) -> 0100 (4) 0100 (4) -> 0110 (6) 0101 (5) -> 0110 (6) 0110 (6) -> 1000 (8) 0111 (7) -> 1000 (8)

Finally, it is adding the original value of $5, which was the data parameter.

If you put all together, and replace with the names of the C variables for clarity, you arrive at:

$5 = data + next_even(~data + last_byte)

Recall that, for two's complement integers:

x - y == x + ~y + 1

Therefore:

$5 = data + next_even(last_byte - data - 1) = data + next_even(len - 2)

Now, computing the next even value after subtracting 2 is basically removing the lowest bit of information; in other words, a "floor" to the even numbers. This can be expressed as returning the same number if it is even, or one less if it is odd, i.e.:

$5 = data + (len % 2 ? len : len - 1)

Finally, the compiler compares this register to $6 (last_byte). Simplifying:

last_byte == data + (len % 2 ? len : len - 1) data + len - 1 == data + (len % 2 ? len : len - 1) len - 1 == len % 2 ? len : len - 1 len % 2 != 0

Now we can also see that the expression actually only depends on len, and not on data.

The compiler, with all those instructions, is effectively re-computing dataptr from data and last_bytes. Indeed, if you consider that dataptr is only ever advanced from data in increments of 2, we can rewrite it as:

data + 2 * n_increments data + 2 * (len / 2) data + (len % 2 ? len : len - 1)

Which is precisely the value of $5 computed above.

Knowing this, one can wonder why the compiler arrived at this solution. The same happens for a recent version of GCC (8.1.0) and x86-64:

mov rdx, rsi not rdx add rdx, r8 shr rdx lea rsi, [rsi+2+rdx*2]

It is clear that the optimizer is realizing that the final value of dataptr can be computed independently of the while loop -- however, it is unclear why it decides to do so instead of picking the value from the register. Maybe it has decided that avoiding the dependency on the result of the loop is faster (due to instruction pipelining) than otherwise.

更多推荐

从C编译器了解MIPS汇编代码(Understanding MIPS assembly code from C compiler)

最满意答案

发布评论取消回复

最近发表

热门文章

标签列表