February 25th, 2024

Let's begin with a simple example of a misaligned struct

// Misaligned struct
struct misaligned {
    char a;        // 1 byte
    int b;         // 4 bytes
    char c;        // 1 byte
};

struct misaligned initialise_misaligned() {
  struct misaligned tmp = {'0', 1, '2'};
  return tmp;
}

The size of this struct via sizeof(struct misaligned) equals 12.

This is the assembly code generated by gcc with -O3:

initialise_misaligned:
  mov  BYTE PTR -20[rsp], 48
  mov  DWORD PTR -16[rsp], 1
  mov  rax, QWORD PTR -20[rsp]
  mov  BYTE PTR -12[rsp], 50
  mov  edx, DWORD PTR -12[rsp]
  ret
  1. Stores a == '0' == 48 20 bytes below rsp.
  2. Stores b == 1 4 bytes below a
  3. Stores c == '2' == 50 4 bytes below b.

You can see that there's a lot of alignment and padding going on. The compiler decided to put b 4 bytes bellow a even though a only needed a single byte.

This is because, the compiler decided to align the int b; member on a 4-byte boundary, which resulted in the observed padding between a and b.

On x86 and x86-64 architectures, accessing an int that is not aligned to a 4-byte boundary may require multiple memory accesses, which can be slower than accessing an aligned int in a single memory access.

The compiler also aligns the entire structure to its most strictly aligned member. The compiler may also increase the size of structure if necessary, to make it a multiple of the alignment by adding padding at the end of the structure. This is known as Tail Padding.

Note that the [rsp-8] will store the instruction pointer when ret is hit. The struct element c itself starts at [rsp-12] and ends at [rsp-11]. This means that there are 3 bytes of padding at the end.

Total = 12 bytes.

Better aligning the struct

MinimiZe memory waste by ordering the structure elements such that the biggest element comes first, followed by the second biggest, and so on so forth.

struct aligned {
    int b;         // 4 bytes
    char a;        // 1 byte
    char c;        // 1 byte
};

struct aligned initialise_aligned() {
  struct aligned tmp = {'0', 1, '2'};
  return tmp;
}

The return value for sizeof(struct aligned); is now 8 instead of 12.

The function initialise_aligned, when optimised is simply:

initialise_aligned:
  movabs  rax, 54979876356144
  ret

Which in binary is:

0b110010 00000001 00000000 00000000 00000000 00110000

The whole value is aligned now and fits into rax, so the compiler aggressively optimised it.

Note that we got 3 bytes of padding between a and b so that a is in a 4 byte boundary.