Pages

Sunday, February 5, 2012

De-coding Shellcode

Disclaimer: I don't know if I did any of this correctly, do not rely on this. Just writing about my own learning process.


As part of one of my courses at university, I've been learning a bit of x86 Assembly lately, as well as how actual machine code corresponds to it. I wanted to try and apply this to some "real" code, and since I'm currently half-working on Jon Erickson's "Hacking - The Art of Exploitation", I decided to try and convert a piece of shellcode back to its mnemonic asm form. I really wanted to do this also because I want to know what exactly the shellcode does, as this wasn't explained (at this point in the book, anyway).

Here's the raw shellcode:
\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89\xe1\xcd\x80
Looks like nothing much to me. Anyway, I found a pretty decent resource for looking up the corresponding mnemonics for opcodes, however putting it all together was still at times tricky for me.

Let's take the first instruction for instance: 0x31. This reveals that 31 represents the XOR r/m16/32, r16/32 instruction. Here I ran into the problem of not knowing what registers were being operated on. Googling was difficult, since I didn't even know what question I was really asking, but luckily I got some help in my freshman class Facebook group, and was pointed to this page. As it turns out, the registers are specified by the next command: xC0.

How does identifying the registers work? Each register has a sort of id to reference it, and here the xC0 contains the applicable ids. Below all ids are listed:
eax = 0 = 000
ecx = 1 = 001
edx = 2 = 010
ebx = 3 = 011
esp = 4 = 100
ebp = 5 = 101
esi  = 6 = 110
edi  = 7 = 111

The next operand (here xC0) is organized such that the bits 3-5 represent the source and bits 0-2 represent the operation's destination. In binary, xC0 = 1100 0000, so using our table above, we see that both operands are in fact register eax, making the first operation XOR %EAX, %EAX, which as far as I can see just clears the register. Similarly, the next two operations clear registers ebx and ecx.

We also work with constant values sometimes, we call them immediates. The above shellcode contains a few instances of this, the first starting at \0xb0, which is the MOV r8, imm8 operation. Again, the register (8 bits only this time) is encoded by its id, but this time in the opcode directly, like this: b0 to b7 each represent the same operation, and the second 4 bits represent the register to use. Here, this is register 0, which our table above says is eax, but since we#re working with 8 bit registers, it's gonna be al. The next byte, 0xA4 is the immediate, making this whole operation: MOV $0xA4, %AL.

Then there was a tricky part which I'm not 100% sure I got right, at least I can't see what exactly it does. It starts at 0x68, which specifies the operation PUSH imm16/32. What confused me was whether only the next 2 or the next 4 codes would belong to this operation. I was suspecting 4 (making it 32 bit), since pushing registers such as eax would also be 32bit, but I wasn't entirely sure. Anyway, I then found this in a disassembly of some C program: 
So it seemed that all four codes belonged to this instruction:
PUSH  $0x68732f2f
Below is the full shellcode, decoded. I'm still not sure how it works exactly, in the book it is used in a buffer overflow to spawn a shell.


2 comments:

Johannes said...

Have you looked trough disassemblys yet? The Code first pops 3 Bytes it hasnt pushed, and then pushes some Data for the shell. The interesting part in here is the EAX, because here the starting point for the interruppt call is placed.
(look it up here or in wikipedia)

The other Parameters pushed to the stack are maybe some data to provide to the interrupt and/or to start the shell (the two constants look quite like the constant startaddress of a shell (if you google it you can find about 4k results)

what exactly the first interrupt does i dont know.


But to understand this code i made some notifications.
1. INT 0x80 Interrupts to the code in EAX
2. The two constants used here are known to google :)

So thats the trick i think.

BTW: I'm studying at TUM too.

Philipp Dowling said...

Awesome, thanks. That clears up a few things. I spent all last night deciphering this (look at the time I posted this) and didn't get a chance to look more at what happens in detail.