As part of one of my courses at university, I've been learning a bit of x86 Assembly lately, as well as how actual machine code corresponds to it. I wanted to try and apply this to some "real" code, and since I'm currently half-working on Jon Erickson's "Hacking - The Art of Exploitation", I decided to try and convert a piece of shellcode back to its mnemonic asm form. I really wanted to do this also because I want to know what exactly the shellcode does, as this wasn't explained (at this point in the book, anyway).
Here's the raw shellcode:
\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89\xe1\xcd\x80Looks like nothing much to me. Anyway, I found a pretty decent resource for looking up the corresponding mnemonics for opcodes, however putting it all together was still at times tricky for me.
Let's take the first instruction for instance: 0x31. This reveals that 31 represents the XOR r/m16/32, r16/32 instruction. Here I ran into the problem of not knowing what registers were being operated on. Googling was difficult, since I didn't even know what question I was really asking, but luckily I got some help in my freshman class Facebook group, and was pointed to this page. As it turns out, the registers are specified by the next command: xC0.
How does identifying the registers work? Each register has a sort of id to reference it, and here the xC0 contains the applicable ids. Below all ids are listed:
eax = 0 = 000The next operand (here xC0) is organized such that the bits 3-5 represent the source and bits 0-2 represent the operation's destination. In binary, xC0 = 1100 0000, so using our table above, we see that both operands are in fact register eax, making the first operation XOR %EAX, %EAX, which as far as I can see just clears the register. Similarly, the next two operations clear registers ebx and ecx.
ecx = 1 = 001
edx = 2 = 010
ebx = 3 = 011
esp = 4 = 100
ebp = 5 = 101
esi = 6 = 110
edi = 7 = 111
We also work with constant values sometimes, we call them immediates. The above shellcode contains a few instances of this, the first starting at \0xb0, which is the MOV r8, imm8 operation. Again, the register (8 bits only this time) is encoded by its id, but this time in the opcode directly, like this: b0 to b7 each represent the same operation, and the second 4 bits represent the register to use. Here, this is register 0, which our table above says is eax, but since we#re working with 8 bit registers, it's gonna be al. The next byte, 0xA4 is the immediate, making this whole operation: MOV $0xA4, %AL.
Then there was a tricky part which I'm not 100% sure I got right, at least I can't see what exactly it does. It starts at 0x68, which specifies the operation PUSH imm16/32. What confused me was whether only the next 2 or the next 4 codes would belong to this operation. I was suspecting 4 (making it 32 bit), since pushing registers such as eax would also be 32bit, but I wasn't entirely sure. Anyway, I then found this in a disassembly of some C program:
PUSH $0x68732f2fBelow is the full shellcode, decoded. I'm still not sure how it works exactly, in the book it is used in a buffer overflow to spawn a shell.