x86 assembler is not injective

Look at the following opcode:

mov eax, ecx

There are two ways to assemble it to machine-code:

8b c1 and 89 c8.

First, look at the binary representation:

8b c1 = 1000 1011 1100 0001 = 100010 1 111 000 001

89 c8 = 1000 1001 1100 1000 = 100010 0 111 001 000

Let us analyze the differences within the binary representation. The prefix is the same (100010), and the source bits and the destination bits contain the same data. However, the one bit that changes is the direction bit (the “1” in the first and the “0” in the second). As one can ascertain, this is because this bit indicates the “direction”, either from first reg to second reg or from second reg to first reg.

Both machine-code bytes have the same meaning and they both disassemble back to the same assembly line “mov eax, ecx“. The same goes for every two-register input opcode such as add, sub, or, and, xor, etc. (but not for mul, div, in, out, etc.)

The fact that the x86 assembler is not injective is important to remember for the next time you are searching for a specific binary code in a binary file, writing an anti-virus program, or writing any other program that looks for specific binary bytes. Additionally, looking at the binary representation of an opcode helps you understand it better.

About accessomat

technical blog about things i do and working on

View all posts by accessomat →