mov edi, edi

Created at: 2024-06-08
Updated at: 2024-07-27

I had a surprise today when I saw the instruction mov edi, edi as the first instruction of a function call.

This is my C code:

unsigned int func(unsigned int idx) {
  static unsigned int my_table[] = {10, 20, 30, 40};
  return my_table[idx];
}

Which returned the following x86 assembly (compiled via gcc):

func:
  mov  edi, edi
  lea  rax, my_table.0[rip]
  mov  eax, DWORD PTR [rax+rdi*4]
  ret
  .size  func, .-func
  .section  .rodata
  .align 16
  .type  my_table.0, @object
  .size  my_table.0, 16
my_table.0:
  .long  10
  .long  20
  .long  30
  .long  40

This code was compiled with the flag -O3, which I thought was going to eliminate all useless instructions. To my surprise, when I removed all the unsigned keywords from the function, the mov edi, edi disappeared in favour of a movsx rdi, edi! Here's the equivalent asm code:

func:
  movsx  rdi, edi
  lea  rax, my_table.0[rip]
  mov  eax, DWORD PTR [rax+rdi*4]
  ret
  .size  func, .-func
  .section  .rodata
  .align 16
  .type  my_table.0, @object
  .size  my_table.0, 16
my_table.0:
  .long  10
  .long  20
  .long  30
  .long  40

I went on a spiral of research, and I found many links pointing to this instruction being necessary in Microsoft Windows, so that the OS could operate hot-patching. source

However, I compiled this on Linux. This should not be relevant to me. Here is the catch; that mov edi, edi operation is used for zero'ing the most significant 32 bits of the rdi register.

It does not seem obvious, but the answer can be found in the x86 tour of Intel manuals source

General-purpose Registers (...)

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result in the destination general-purpose register. (...)

UPDATE: In case it wasn't clear from the quote above, the zero-extension only works for 32 bit operands. If you run mov di, di (di is 16 bits long), the zero-extension will not happen.

The zero-extension is indeed what happens when I try to run the following mock assembly code below:

main:
  ; Load `rdi` with all one's.
  mov rdi, 0xFFFFFFFFFFFFFFFF
  ; After the instruction below,
  ; rdi will be 0x0000000011111111
  mov edi, edi
  ret

This was not obvious to me at all. The initial instruction mov edi, edi just looked like a nop equivalent with two bytes...

Coming back to my original function:

unsigned int func(unsigned int idx) {
  static unsigned int my_table[] = {10, 20, 30, 40};
  return my_table[idx];
}

Since I am using unsigned integers, the compiler can trust that the arguments passed to that function in assembly won't be more than 32 bits long in my machine.

UPDATE: The compiler actually doesn't need to "trust" anything, it actually does not matter. The movsx instruction accepts operands of different sizes. This means that the 32 bits in edi will be moved with sign-extension to fit the 64 bits of rdi. The underlying 32 bit value will remain the same, and it doesn't matter what bits were in the most-significant upper 32bits of rdi before the mov operation - they will just be completely ignored. That is why the instruction mov edi, edi is not necessary beforehand!

Remember that the ABI for C functions calls in assembly is that the first argument to the function, in this case idx, will be passed in the register rdi.

So this function is cleaning up the most significant bits of rdi for us. I am still not totally sure why this is necessary, but perhaps the compiler assumes that some garbage could be held in the most significant bits of rdi and tries to clean that up first to avoid potential bugs.

This assumption makes sense to me at first, because down in the assembly function body, we rely on rdi for finding the address offset of the element in the table that we want to return: mov eax, DWORD PTR [rax+rdi*4].

Now remains the question: "Why is there an assumption by the compiler that rdi can contain garbage in the most significant bits?".

This can happen if the function is being called with a "casted" value given that casting per-se does not clean up unused bits of a 64bit register. That could happen if a 64 bit integer was casted down to a 32 bit one.

Again, this is very much based on my own understanding on how assembly works in my platform, if you think that I got something wrong please send me an email at marceelofernandes@gmail.com.