February 25th, 2024
Let's begin with a simple example of a misaligned struct
// Misaligned struct
struct misaligned {
char a; // 1 byte
int b; // 4 bytes
char c; // 1 byte
};
struct misaligned initialise_misaligned() {
struct misaligned tmp = {'0', 1, '2'};
return tmp;
}
The size of this struct via sizeof(struct misaligned)
equals 12.
This is the assembly code generated by gcc with -O3:
initialise_misaligned:
mov BYTE PTR -20[rsp], 48
mov DWORD PTR -16[rsp], 1
mov rax, QWORD PTR -20[rsp]
mov BYTE PTR -12[rsp], 50
mov edx, DWORD PTR -12[rsp]
ret
a == '0' == 48
20 bytes below rsp.b == 1
4 bytes below a
c == '2' == 50
4 bytes below b
.You can see that there's a lot of alignment and padding going on.
The compiler decided to put b
4 bytes bellow a
even though a
only needed
a single byte.
This is because, the compiler decided to align the int b; member on a 4-byte boundary, which resulted in the observed padding between a and b.
On x86 and x86-64 architectures, accessing an int that is not aligned to a 4-byte boundary may require multiple memory accesses, which can be slower than accessing an aligned int in a single memory access.
The compiler also aligns the entire structure to its most strictly aligned member. The compiler may also increase the size of structure if necessary, to make it a multiple of the alignment by adding padding at the end of the structure. This is known as Tail Padding.
Note that the [rsp-8] will store the instruction pointer when ret
is hit.
The struct element c
itself starts at [rsp-12] and ends at [rsp-11]. This
means that there are 3 bytes of padding at the end.
a
and b
b
and c
c
Total = 12 bytes.
MinimiZe memory waste by ordering the structure elements such that the biggest element comes first, followed by the second biggest, and so on so forth.
struct aligned {
int b; // 4 bytes
char a; // 1 byte
char c; // 1 byte
};
struct aligned initialise_aligned() {
struct aligned tmp = {'0', 1, '2'};
return tmp;
}
The return value for sizeof(struct aligned);
is now 8 instead of 12.
The function initialise_aligned
, when optimised is simply:
initialise_aligned:
movabs rax, 54979876356144
ret
Which in binary is:
0b110010 00000001 00000000 00000000 00000000 00110000
The whole value is aligned now and fits into rax, so the compiler aggressively optimised it.
0b110010
= 50 == '2' == c
.00110000
= 48 == '0' == b
.00000001
= 1 == a
.Note that we got 3 bytes of padding between a
and b
so that a
is in a
4 byte boundary.