April 05, 2023
Note 2024-03-23: It would be preferrable to adapt this to the "C" notes, as there is some C++ specificity to this that might not aggregate much to a "bit manipulation" note.
Bit shifting involves moving bits of a binary number to either the left or the
right. Most programming languages use the characters <<
to indicate a left
bit shift, and >>
to indicate a right bit shift.
There are a few common operations that can be performed through. These are:
int main() {
int twenty_three = 0b10111;
int ninety_two = twenty_three << 2;
std::cout << ninety_two;
}
int main() {
int twenty_two = 0b10110;
int eleven = twenty_two >> 1;
std::cout << eleven;
}
int main() {
// Extract the forth less-significant bit.
int binary_number = 0b01110;
int bit_value = binary_number >> 3 & 0b1;
std::cout << bit_value;
}
int main() {
int binary_number = 0b001001;
int one = 1;
// set the 3rd bit to 0
int result = binary_number | (one << 2);
std::cout << result; // 0b001101
}
A bit masking is a binary number used to enable or disable specific bits of another binary number. In the examples above we have already used bit masking.
In our insertion example above, we could have just used a masking number and
the |
or bit operator instead:
int main() {
int binary_number = 0b000001;
int mask = 0b000100;
std::cout << (binary_number | mask); // 5
}
Here's a simple code to set the second bit from right to left using a mask and
the |
operator:
int bit_number = 0b00100;
int mask = 0b00010;
int result = bit_number | mask;
And the assembly code:
bit_number:
.long 4
mask:
.long 2
result:
.zero 4
__static_initialization_and_destruction_0(int, int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
cmp DWORD PTR [rbp-4], 1
jne .L3
cmp DWORD PTR [rbp-8], 65535
jne .L3
mov edx, DWORD PTR bit_number[rip]
mov eax, DWORD PTR mask[rip]
or eax, edx
mov DWORD PTR result[rip], eax
.L3:
nop
pop rbp
ret
_GLOBAL__sub_I_bit_number:
push rbp
mov rbp, rsp
mov esi, 65535
mov edi, 1
call __static_initialization_and_destruction_0(int, int)
pop rbp
ret
This assembly code defines a function called
"__static_initialization_and_destruction_0"
that takes two integer arguments
and initializes a global variable called "result" based on their values.
The code starts by reserving space on the stack for the function's local variables using the "push" and "mov" instructions. It then stores the function arguments in the stack at [rbp-4] and [rbp-8].
Next, the code checks if the first argument ([rbp-4]) is equal to 1 using the "cmp" instruction, and jumps to label ".L3" if it is not equal. It then checks if the second argument ([rbp-8]) is equal to 65535 using the "cmp" instruction, and also jumps to label ".L3" if it is not equal.
If both conditions are met, the code loads the value of the global variable
"bit_number"
into the register edx using the "mov" instruction with a
RIP-relative addressing mode. It then loads the value of the global variable
"mask" into the register eax, and performs a bitwise OR operation between eax
and edx using the "or" instruction. The resulting value is then stored in the
global variable "result" using a RIP-relative addressing mode.
Finally, the code executes a "nop" instruction, pops the stack to restore the previous stack pointer using the "pop" instruction, and returns from the function using the "ret" instruction.
There is also another function called "_GLOBAL__sub_I_bit_number"
that
initializes the global variables "edi" and "esi" with the values 1 and 65535
respectively, and then calls the "__static_initialization_and_destruction_0"
function with these values. This function is likely called automatically by the
compiler during program startup to initialize the global variables before
main() is called.
This looks a bit more convoluted than you would expect because the compiler
adds the overhead of using 1
and 65535
as "magic numbers" to initialise
those variables. I still don't know for sure why this needs to happen.
Now let's try to achieve the same result using shifting to see the code difference:
int bit_number = 0b00100;
int mask = 0b00001;
int result = bit_number | (mask << 3);
The assembly code looks like this:
bit_number:
.long 4
mask:
.long 1
result:
.zero 4
__static_initialization_and_destruction_0(int, int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
cmp DWORD PTR [rbp-4], 1
jne .L3
cmp DWORD PTR [rbp-8], 65535
jne .L3
mov eax, DWORD PTR mask[rip]
lea edx, [0+rax*8]
mov eax, DWORD PTR bit_number[rip]
or eax, edx
mov DWORD PTR result[rip], eax
.L3:
nop
pop rbp
ret
_GLOBAL__sub_I_bit_number:
push rbp
mov rbp, rsp
mov esi, 65535
mov edi, 1
call __static_initialization_and_destruction_0(int, int)
pop rbp
ret
The only difference lies on how the bit shifting happens:
lea edx, [0+rax*8]
The instruction lea edx, [0+rax*8]
performs a "load effective address"
operation, which computes the memory address that corresponds to a given memory
operand and stores it in the destination register. In this case, the
instruction is calculating an address based on the value of rax, which was
previously loaded with the value of the mask variable.
The [0+rax*8]
memory operand specifies a scaled indexed addressing mode, where
the value of rax is multiplied by the scale factor of 8 (i.e., each element of
the array is 8 bytes wide), and then added to the base address of 0. Since the
base address is 0, the instruction is effectively multiplying rax by 8.
The reason that the lea edx, [0+rax*8]
instruction is using the lea (load
effective address) instruction instead of the mul (multiply) instruction is
that lea is a more efficient way of performing simple arithmetic operations
that involve multiplication by a power of two.
In this case, the mul instruction would not be appropriate because we are not multiplying rax by an arbitrary value, but rather by a power of two (i.e., 8). Since multiplication by a power of two is equivalent to shifting the binary representation of the number to the left by a certain number of bits, the lea instruction can be used to accomplish the same thing as the mul instruction, but more efficiently.
Specifically, the lea instruction can be used to compute an address that is a scaled version of a register value, where the scale factor can be 1, 2, 4, or 8. If we needed to multiply by a number greater than 8, we wouldn't be using this instruction any more.