Endianness
字节序大家见得比较多,网络上论述也比较多。这里简要介绍:
书写十六进制数据时,我们习惯上 MSB 在左,而 LSB 在右。
- LSB: least significant byte
- MSB: most significant byte
-
大端:Big-endian
数据在内存中(地址由低到高)的存放顺序和书写顺序是一致的。
记忆方法:低地址放的是数据的 MSB,所以称作大端。
低地址存 MSB,高地址存 LSB。 -
小端:Litthle-endian
数据在内存中(地址由低到高)的存放顺序和书写顺序是相反的。
记忆方法:低地址放的是数据的 LSB,所以称作小端。即“高高低低”。
低地址存 LSB,高地址存 MSB。
如何写出兼容大小端的代码
编码时如果不注意大小端的问题,容易在可移植性上打折扣,并且出问题时不容易定位。特别是指针操作,常见的问题如下:
- 将长类型数据强转成短类型的指针进行操作
long long val = 0;
void func((int *)val);
...
check(val);
上面的代码在大端情况下就容易出问题。例如 func 里将 &val 处赋值 int
类型的数据 0x1122,3344
。check(val) 时,由于 val 为 8 字节的数据,此时实际读取到的值为 0x1122,3344,0000,0000
。小端场景则无此问题,读者可以自己思考一下为什么。
Bit numbering
我们知道一个字节有 8 个比特位。从第 0 位到第 7 位共 8 位。位序描述比特位在字节中的存放顺序。
可参阅 维基百科中关于位序的描述。
这里的 LSB 及 MSB 的用词代表的是:
- LSB是指 least significant bit
- MSB是指 most significant bit
位序分为两种:
- LSB 0
字节的第 0 位存放数据的least significant bit
,即我们的数据的最低位存放在字节的第 0 位。
以十进制150为例,0b10010110
。在LSB 0方式下存放形式为:
| 7 0 |
---------------------------------
| 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
---------------------------------
Least Significant Bit First means that the least significant bit will arrive first.
数据流从 bit0 开始传送,故数据流出现的顺序是 01101001
。
- MSB 0
字节的第 0 位存放数据的most significant bit
,即我们的数据的最高位存放在字节的第 0 位。
以十进制150为例,0b10010110
。在MSB 0方式下存放形式为:
| 7 0 |
---------------------------------
| 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 |
---------------------------------
Most Significant Bit First means that the most significant bit will arrive first.
也就是说数据流出现的顺序是 10010110
。
实际机器类型
字节序是小端的 CPU 通常其位序为 LSB 0。数据不仅在内存中是“高高低低”存放,其位序也是“高高低低”放置的,即 msb 放在 bit7 位置上,lsb 放置在 bit0 位置上。值得注意的是,字节序是大端的 CPU 采用的位序却不是那么统一,既有 MSB 0,也要 LSB 0 的机器。
Bit order usually follows the same endianness as the byte order for a given computer system. That is, in a big endian system the most significant bit is stored at the lowest bit address; in a little endian system, the least significant bit is stored at the lowest bit address.
如果我们要表示一个整数 0xabcd
。
Write Integer for Big Endian System
byte addr 0 1 2 3
bit offset 01234567 01234567 01234567 01234567
binary 00001010 00001011 00001100 00001101
hex 0a 0b 0c 0d
Write Integer for Little Endian System
byte addr 3 2 1 0
bit offset 76543210 76543210 76543210 76543210
binary 00001010 00001011 00001100 00001101
hex 0a 0b 0c 0d
位域 Bit Fields
For big-endian mode, bit fields are packed into registers from most significant bit (MSB) to least significant bit (LSB) in the order in which they are defined. Bit fields are packed in memory from most significant byte (MSbyte) to least significant byte (LSbyte).
For little-endian mode, bit fields are packed into registers from the LSB to the MSB in the order in which they are defined, and packed in memory from LSbyte to MSbyte.
大小端对位域打包方式的影响
大端场景下根据定义的顺序,从 MSB 开始排布到 LSB。
小端场景下根据定义的顺序,从 LSB 开始排布到 MSB。
- 位域的类型可以是 unsigned / signed
- 包含位域的结构体的大小和对齐值取决于定义位域所使用的数据类型
如 struct st { int a:4 }; 该结构体的大小为4,对齐值为4。 - 无名位域会影响结构体或联合体的大小和对齐值
如 struct st { char a:4; int :22; }; 该结构体的大小为4,对齐值为4。
struct {
int A:7;
int B:10;
int C:3;
int D:2;
int E:9;
} x;
LEGEND: X = not used, MS = most significant, LS = least significant
Big-endian register
MS LS
A A A A A A A B B B B B B B B B B C C C D D E E E E E E E E E X
6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 2 1 0 1 0 8 7 6 5 4 3 2 1 0 X
31 0
Big-endian memory
Byte 0 Byte 1 Byte 2 Byte 3
A A A A A A A B B B B B B B B B B C C C D D E E E E E E E E E X
6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 2 1 0 1 0 8 7 6 5 4 3 2 1 0 X
Little-endian register
MS LS
X E E E E E E E E E D D C C C B B B B B B B B B B A A A A A A A
X 8 7 6 5 4 3 2 1 0 1 0 2 1 0 9 8 7 6 5 4 3 2 1 0 6 5 4 3 2 1 0
31 0
Little-endian memory
Byte 0 Byte 1 Byte 2 Byte 3
B A A A A A A A B B B B B B B B E E D D C C C B X E E E E E E E
0 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 1 0 1 0 2 1 0 9 X 8 7 6 5 4 3 2
以下面的数据类型为例。
union {
unsigned short value;
unsigned char byte[2];
struct {
unsigned short one : 1;
unsigned short two : 2;
unsigned short three : 3;
unsigned short four : 4;
unsigned short five : 5;
} field;
} u;
其排布为:
大端场景
小端场景
可以通过下面的程序来验证一下。
#include <stdio.h>
int main() {
union {
unsigned short value;
unsigned char byte[2];
struct {
unsigned short one : 1;
unsigned short two : 2;
unsigned short three : 3;
unsigned short four : 4;
unsigned short five : 5;
} field;
} u;
u.value = 0;
u.field.one = 1;
u.field.two = 2;
u.field.three = 3;
u.field.four = 4;
u.field.five = 5;
printf("The fields are 1, 2, 3, 4, 5.\n");
printf("The entire hex value is 0x%04x\n", u.value);
printf("The first byte is 0x%02x\n", u.byte[0]);
printf("The second byte is 0x%02x\n", u.byte[1]);
return 0;
}
在 x86 小端,arm,aarch64 大小端场景下的测试结果如下:
// x86 小端
The fields are 1, 2, 3, 4, 5.
The entire hex value is 0x151d
The first byte is 0x1d
The second byte is 0x15
// aarch64_be
The fields are 1, 2, 3, 4, 5.
The entire hex value is 0xcd0a
The first byte is 0xcd
The second byte is 0x0a
// armeb
The fields are 1, 2, 3, 4, 5.
The entire hex value is 0xcd0a
The first byte is 0xcd
The second byte is 0x0a
// arm
The fields are 1, 2, 3, 4, 5.
The entire hex value is 0x151d
The first byte is 0x1d
The second byte is 0x15
// aarch64
The fields are 1, 2, 3, 4, 5.
The entire hex value is 0x151d
The first byte is 0x1d
The second byte is 0x15
字节序测试
#include <stdio.h>
int main()
{
union {
char c;
int i;
} u;
u.i = 0x11223344;
if(u.c == 0x44) {
printf("little-endian\n");
} else if (u.c == 0x11) {
printf("big-endian\n");
} else {
printf("unknown\n");
}
return 0;
}
位序测试
union {
struct {
unsigned char a1:2;
unsigned char a2:3;
unsigned char a3:3;
} x;
unsigned char b;
} d;
int main (void)
{
d.b = 150;
printf("0x%x\n0x%x\n0x%x\n", d.x.a1, d.x.a2, d.x.a3);
return 0;
}
结果如下:
// 150 == 0b1001 0110
arm, aarch64
/* bit numbering: lsb 0
* 0x2 (== 10)
* 0x5 (== 101)
* 0x4 (== 110)
* /
armeb, aarch64_be
/* bit numbering: lsb 0
* 0x2 (== 10)
* 0x2 (== 010)
* 0x6 (== 100)
* /
更多推荐
字节序和位序(大小端)
发布评论