Philipp Dowling: 2012

Saturday, February 25, 2012

Quick- and Mergesort in one line of Python

One-liners are always fun, so as a challenge I decided to implement both Quicksort and Mergesort in one line of Python, one line meaning one expression, no semicolons and no eval(). Quicksort was relatively straightforward, since it only really requires one operation applied recursively:

Mergesort however was more of a challenge. The problem is the merging of two sorted lists - this requires a loop, which is not trivial to implement inside of another recursive function in just one expression. Luckily though, the the process of merging can be achieved through recursion, and so we can pack it into a lambda expression that is called recursively.
We then still have the problem of having a lambda expression call itself inside of another lambda expression - it took some searching, but eventually I found a solution: a Z-combinator.

(lambda f: (lambda x: f(lambda *args: x(x)(*args)))(lambda x: f(lambda *args: x(x)(*args))))

This expression will take as an argument a function that is to be called recursively, and do exactly that. It allows for recursion with anonymous function. So now, all we need to do is pass the merge function to the Z combinator, and then call the resulting expression with the (recursively) mergesorted first and second half of our list. Here'e the code:

Looks complicated, but you can see that the first lambda is just the Z combinator, followed by a rather long lambda on the variable m, this is our merge function. The reason for all the or and and's is the they are used to replace (if ... else ...) statements, since we can't nest these arbitrarily in lambdas. Also, the (r.append(a[0]) or a.remove(a[0]) or m(a,b,r)) statements simply return the third element (the recursive function call), since append() and remove() simply return none, and (none or [list] ) returns [list] in Python. This is a neat little hack that can be used to not only combine expressions but also to add return values to functions that otherwise return None.

Note that the Mergesort function here is highly recursive, I could only sort lists of around 330 elements before reaching maximum recursion depth. It is still reasonably fast however, taking around 0.004 seconds to sort 330 elements. Quicksort is much faster though, with 0.0002 seconds for the same list.

Wednesday, February 22, 2012

pybfc, A BF to C Compiler written in Python

So instead of studying for my exams, I decided to be less than productive and wrote a simple brainfuck to C compiler, in Python. I didn't initially mean for this to come out as hacky as it did, but I eventually decided I wanted to buff it down to 10 lines maximum.

Compiling brainfuck to C is pretty simple really, since all instructions in bf can be seen as operations on a single pointer variable in C, so all that needs to be done is replace each character with the appropriate C statement, which I did through a dictionary. There's a bit of code needed to initialize the program, but from then on it's all just one-to-one replacing. Finally, the script calls gcc to compile the code to be executable, using an optional command line argument as the output filename.

To compile bf code, save this as (for example) pybfc, and then run:

./pybfc .bf outputfilename

You can then execute the generated program just by running ./outputfilename.

Here's the source for a "Hello World!" program in brainfuck:

++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.

This compiles to some rather lengthy C code, I won't post it here, but the compiler will actually generate the C code into a file ("out.c") as well.

Tuesday, February 21, 2012

Shellcode, continued

I posted some decoded x86 machine code here last week, however I didn't really explain what it did (also because I didn't actually know yet). I found a cleaned up version of the corresponding Assembler code, complete with comments, which makes the whole thing pretty understandable.

Basically, the code consists of two parts: gaining root privileges, and then spawning a shell. Each is accomplished through a syscall (this is what the int 0x80 instructions are for). I reality, code like this might be deployed through a buffer overflow - for instance, one could flood an unchecked text input field with NOP instructions (0x90), overriding the following memory with instructions to do nothing at all (this is called a NOP slide) , and then placing the shellcode at the end of this slide, causing it to be executed and you to be prompted with a root shell.
Now, it's not quite as simple as that, after all the string (and shellcode) will only be copied into the buffer, and even if it overflows it, the code's execution is not guaranteed. For this, some function will have to actually have to jump or return to an address that lies somewhere on our NOP slide. If you want to find out in detail how to do this, I suggest you read this very excellent article on the subject.

Thursday, February 9, 2012

Balancing Graphs in O(n*log(n))

The Barnes-Hut algorithm is used to model N-body systems more efficiently, at the expense of precision. However, the improvement in performance is tremendous: from O( n² ) to O( n*log(n) ). For reference, this is what that looks like:

Since our force-based approach to computing graph layout is essentially nothing other than an N-body simulation, and we don't require great precision anyway, this algorithm seems perfect for us to use. The way it works is basically this:

The set of nodes is sorted into a quadtree, which is essentially a binary tree, but with an added dimension
Each of the quadtree sectors keeps track of its mass and center of mass
To compute the forces, the tree is traversed and for points that are "far" from a certain center of mass, only the force from the node to that center of mass is calculated, as opposed to each force between all nodes

This leads to a significant decrease in overall calculations that needs to be done, since distant nodes no longer need to interact directly. There is some overhead on creating the quadtree every frame, but the growth of this is also O(n*log(n)). If you want to learn more, here is a very good explanation of the algorithm complete with pseudocode (in German, but the code is understandable anyway).

I haven't had time to implement this algorithm in our Qt application to graph Wikipedia, however I managed to code up the basic algorithm in Python. The improvement is tremendous, before I was never able to reasonably run a simulation with more than 80 or so nodes in Python, now even systems up to 350 nodes run at a reasonable framerate.

Illustration of the quadtree (unbalanced)

The same graph after balancing

The algorithm can also be parallelized very well, so the potential of this is tremendous, with good implementation and appropriate hardware we could realistically graph all of the German Wikipedia (~1.3 million articles), maybe more.

The full Python code is here.

Sunday, February 5, 2012

De-coding Shellcode

Disclaimer: I don't know if I did any of this correctly, do not rely on this. Just writing about my own learning process.

As part of one of my courses at university, I've been learning a bit of x86 Assembly lately, as well as how actual machine code corresponds to it. I wanted to try and apply this to some "real" code, and since I'm currently half-working on Jon Erickson's "Hacking - The Art of Exploitation", I decided to try and convert a piece of shellcode back to its mnemonic asm form. I really wanted to do this also because I want to know what exactly the shellcode does, as this wasn't explained (at this point in the book, anyway).

Here's the raw shellcode:

\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89\xe1\xcd\x80

Looks like nothing much to me. Anyway, I found a pretty decent resource for looking up the corresponding mnemonics for opcodes, however putting it all together was still at times tricky for me.

Let's take the first instruction for instance: 0x31. This reveals that 31 represents the XOR r/m16/32, r16/32 instruction. Here I ran into the problem of not knowing what registers were being operated on. Googling was difficult, since I didn't even know what question I was really asking, but luckily I got some help in my freshman class Facebook group, and was pointed to this page. As it turns out, the registers are specified by the next command: xC0.

How does identifying the registers work? Each register has a sort of id to reference it, and here the xC0 contains the applicable ids. Below all ids are listed:

eax = 0 = 000
ecx = 1 = 001
edx = 2 = 010
ebx = 3 = 011
esp = 4 = 100
ebp = 5 = 101
esi = 6 = 110
edi = 7 = 111

The next operand (here xC0) is organized such that the bits 3-5 represent the source and bits 0-2 represent the operation's destination. In binary, xC0 = 1100 0000, so using our table above, we see that both operands are in fact register eax, making the first operation XOR %EAX, %EAX, which as far as I can see just clears the register. Similarly, the next two operations clear registers ebx and ecx.

We also work with constant values sometimes, we call them immediates. The above shellcode contains a few instances of this, the first starting at \0xb0, which is the MOV r8, imm8 operation. Again, the register (8 bits only this time) is encoded by its id, but this time in the opcode directly, like this: b0 to b7 each represent the same operation, and the second 4 bits represent the register to use. Here, this is register 0, which our table above says is eax, but since we#re working with 8 bit registers, it's gonna be al. The next byte, 0xA4 is the immediate, making this whole operation: MOV $0xA4, %AL.

Then there was a tricky part which I'm not 100% sure I got right, at least I can't see what exactly it does. It starts at 0x68, which specifies the operation PUSH imm16/32. What confused me was whether only the next 2 or the next 4 codes would belong to this operation. I was suspecting 4 (making it 32 bit), since pushing registers such as eax would also be 32bit, but I wasn't entirely sure. Anyway, I then found this in a disassembly of some C program:

So it seemed that all four codes belonged to this instruction:

PUSH $0x68732f2f

Below is the full shellcode, decoded. I'm still not sure how it works exactly, in the book it is used in a buffer overflow to spawn a shell.

Friday, February 3, 2012

The Westermann Programming language!

My CS1 professor is known for repeating the ever-living hell out of a set of buzzwords he has, in fact a friend of mine and I wrote a little program that uses speech recognition to listen for them and gives you an alert whenever he says one - the idea is that this would make an awesome drinking game. And yes, we can drink during our lectures, and it's awesome. Proof:

Buzzword bingo has also been proposed. Anyway, during one particularly boring lecture (on language syntax and regular expressions) I decided it would be fun to create a programming language out of these buzzwords, inspired by Ook!, which simply uses variants of the "Ook!" to represent each instruction in Brainfuck - probably the simplest way to create a programming language!

I managed to put together a buzzword-to-Brainfuck compiler in a few minutes. It's really simple actually, all one needs to do is decide what buzzword to map to which Brainfuck command, replace each sting with that command and then feed the result to a Brainfuck compiler/interpreter. Amazingly, there's even a Brainfuck interpreter written in Python - in just one line! It's very obfuscated and unreadable, but seems to work as intended: http://www.cs.princeton.edu/~ynaamad/misc/bf.htm.

Here's the "Hello World!" program in the Westermann (name of the prof) language:

"schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend ist klar einfach schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend einfach schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend einfach schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend einfach schlicht und ergreifend also also also also konkret letztendlich einfach schlicht und ergreifend schlicht und ergreifend ganz einfach schlicht und ergreifend ganz schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend ganz ganz schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend ganz einfach schlicht und ergreifend schlicht und ergreifend ganz also also schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend ganz einfach ganz schlicht und ergreifend schlicht und ergreifend schlicht und ergreifend ganz konkret konkret konkret konkret konkret konkret ganz konkret konkret konkret konkret konkret konkret konkret konkret ganz einfach schlicht und ergreifend ganz einfach ganz "

Writing the actual converter was trivial, I'll just leave you the code here:

That code includes functions to convert between the "languages", and also the source code for a "Hello World!" program. I'm calling it the Westermann programming language because that's the name of my prof. Also, yes the words are German, I go to school in Munich.
Finally, I managed to cram the entire converter/interpreter into one line, so saving this next piece of code as wmInterpret.py and running

python wmInterpret.py yourprogramsourcefile

will run the specified program. You could replace any of the words with whatever you want and just like that you've got yourself a programming language - although probably a pretty incomprehensible one. Here's the one-line interpreter:

Anyway, have fun with this, make your own little language if you feel like it. Have a good weekend!

Thursday, January 26, 2012

A Chapter in Python List Comprehension Abuse

Apparently Facebook held the qualification to this years Hackercup last weekend, I just came across some of the problems and decided to write a quick solution for one of them. And since Python has this awesome thing called list comprehensions, I managed to cram the solution into a single (ugly) line.
The task is basically this: Given a string, find how many times the string "HACKERCUP" can be formed from its letters without resusing any letters. (Full description here) Seems simple enough, lets see..
First, a little bit on list comprehensions. Usually, to define a list you do the following:

To fill a list, we could to this: But, it turns out there's an easier way - Python lets you create lists "implicitly", so to say. You can iterate over range(10) inside the list and create elements dynamically:

Looks interesting, and there's a lot you can do with this, because there are very few limitations on what you can put inside a list comprehension. You can even nest them to create matrices or do other things. The following line will flatten a list of lists:
For my solution to the problem, I actually read input inside a list comprehension, and then generate a list of that input to work with.

Here's the full code:

EXAMPLE INPUT

5
WELCOME TO FACEBOOK HACKERCUP
CUP WITH LABEL HACKERCUP BELONGS TO HACKER
QUICK CUTE BROWN FOX JUMPS OVER THE LAZY DOG
MOVE FAST BE BOLD
HACK THE HACKERCUP


EXAMPLE OUTPUT

Case #1: 1
Case #2: 2
Case #3: 1
Case #4: 0
Case #5: 1

It's a bit obfuscated, but should still be understandable. The print("\n".join(......)) part prints each element in the following list into a new line, so the output is in the required format. After that, we have the actual generation of our results.
In the last part - [raw_input() for x in range(int(raw_input))] - I read the input. range() takes the number of lines that follow, and then reads that many lines. This list is worked with by the outer list comprehension, which applies a formula to it to count how many times "HACKERCUP" can be formed out of each of the input strings' letters (tC means test case, n is the index of the test case we're on).
The actual formula is this:

We count how many times each letter (the variable c) in "HACKERCUP" occurs in the test case string, and divide by how often it appears in "HACKERCUP" in the first place (rounding down). We store each result in a list, and then take the minimum of that list, giving us our final result.

So, the code is pretty messy and arguably unpythonic, but it demonstrates pretty well how powerful list comprehensions are in Python. Wrap them up in a while(true) loop and you can even create whole mini-games in one line of code!

Wednesday, January 25, 2012

ICPC @ TUM and a Quick State of the Union

Guess what's coming to TUM:

Alright, you didn't actually have to guess.
The ICPC is a yearly programming contest at universities around the world, so naturally I was very excited today to find out that TUM is also participating. Teams can have between two and three members, and I've already recruited Robert for my team, which is definitely something to be excited about aswell, given that he's been coding since he was about 12 years old.
I've been doing a lot of problems on Project Euler lately, in fact I've recently hit 50 solved problems, but competing in a live event will be a very different feeling I expect.
The competition is Saturday in a week (04/02/2012), and will go on for 5 hours as far as I know, during which everyone needs to try and solve as many out of 10 problems as they can. On first glance I was hesitant to spend all Saturday at uni, but then I found out there's gonna be pizza so that was that.

If you're studying at TUM and are looking for a team to join for ICPC, drop me a comment here or send me an email! You can find more info on ICPC here: http://cm.baylor.edu/welcome.icpc and here: http://icpc.in.tum.de/index.php/Main_Page

By the way, Robert is also the guy who's working on the WikiGraph (still unsure of that name) project with me. He's been talking about starting a Blog, so I'll be sure to post a link on here once that happens. Seriously, the guy has skills.
Speaking of Wikigraph, there are some news on what's going on with that as well:

We're getting good graphs for networks of up to fifteen thousand or so nodes, but beyond that our computational capacities don't allow for completing graphs in a reasonable amount of time.
I've talked to two of my professors, and it looks like we may be able to get time on a 30-40 core cluster to run our program, which might allow us to graph the entire German Wikipedia (1.2 million articles). This would be pretty amazing, but we're definitely going to have to work on properly scaling the algorithm so we don't get buggy as the network gets huge.
I'm also working on a way to read the input from a database (specifically the Wikipedia dumps of all pages and pagelinks, which is available at http://dumps.wikimedia.org/), but I'll post some technical stuff on that later. The gist of it is that I have to combine the pages dump (320MB) and pagelinks dump (2.6GB) to get a full description of all the connections. I'm still unsure of what the best way to do it is, and I don't want to waste time trying to do it inefficiently when dealing with these huge amounts of data. Finding disconnected subgraphs may also present itself to be a problem.
Different datasets are also being worked on, one thing I'm particularly interested in is mapping Darknet/Meshnet, so I'm looking for a way to actually extract the entire set of nodes and their connections. I'm not sure if it's actually possible either, I'll have to look more closely into it.
Finally, I used some of the source code from this project to create a simulation of planets/particles interacting through gravity - a planetary system, basically. It's almost more fun to play around with, in fact. Also more on that soon.

Thursday, January 19, 2012

9868 nodes in WikiGraph

Ha! I located one of the bugs, apparently the setting of the repulsion strength parameter in the balancing algorithm caused the flying off into infinity of nodes. I fixed it, and right away managed to graph almost 10,000 articles centered on Mathematics! I could probably go larger, but each of these Graphs is taking up to half an hour to fully balance on my computer. Let's see, maybe TUM has a nice multicore machine we can use for a bit...

A little Preview

So, there haven't been many posts on here the last few days, but certainly not because I've gotten lazy. Robert and I have been coding like crazy on the WikiGraph project this week, with a few sessions of 6+ hours at times.
We're coming along quite well, we've managed to implement multithreading on all of the force calculations, which has given us a speedup factor of 4 on my machine, but the implementaton allows for an arbitrary number of threads and automatically creates as many as the machine it runs on has cores (or virtual cores, with hyperthreading). Along with that comes the improved algorithm, which again about doubles performance.
This allows us to display some really large graphs - by really large, I mean up to around 6000 nodes at the moment, and that limit is mostly due to some unexpected behavior (nodes flying off to infinity), which we suspect is a result of the parameters we use in our algorithm. I expect that by the time we're releasing a beta version (which at current development speed would be sometime next week), we can display graphs of maybe 15000 nodes, which is around 3 levels deep of pagelinks if all links are loaded.
OpenGL support is also something we're implementing at the moment, since CPU rendering of these graphs actually takes up more resources and time than calculating a single iteration of forces, so outsourcing that to a graphics card will definitely help performance.
Apart from that, other changes will mostly be in UI and design, adding options for users to alter the graph and such.

Even though we aren't ready to put out a stable and usable release yet, we've already created some pretty neat visualizations:

Close up of an earlier version, 2 levels centered around Quicksort

3 levels deep (German Wikipedia), also centerd around Quicksort I believe. We're actually displaying over 5000 nodes here.

3 levels around Mathematics, over 6000 nodes. This one is not quite stable, the Mathematics article needs to be fixed otherwise it flies off. Looking into why it does that, all other nodes seem to behave as expected.

Sunday, January 15, 2012

An improved algorithm for N-body simulations

TL;DR: A simple algorithm that speeds up force calculations for n-body systems by a factor of nearly 2.

So the last post on here was about graph balancing using a spring-electrical model, in which each node exerts a repulsion on each other node according to Coulomb's law, and each edge attracts adjacent nodes according to the spring equation.
This is a good example of an N-body problem - the most well-known one is that of planetary/gravitational systems - and one unfortunate thing about these is that modeling them can be quite slow.
Intuitively, the first algorithm that comes to mind is simply iterating through all the bodies, and for each one iterate through each other body and calculating the force they exert on each other. For n bodies (nodes), this results in (n-1)^2 for the whole system - (n-1) because the body does not exert a force on itself. This is of course is a growth rate of O(n^2), not very efficient.
But there's an easy improvement that can improve on this quite a bit making only minimal changes to the algorithm:

We start with the first body, and iterate through all the other body, calculating each force and adding it to the nodes net force
Since the forces between each two bodies will be equal and opposite, we now subtract this force (or add it negatively) from the other bodies' net forces, and then add a reference to the original body to a list (or in Qt a QSet, which is faster) that keeps track of all the bodies that have already been calculated
We go to the next body, and again iterate through all other bodies, except now we skip any bodies already in our list of calculated bodies. This results in one less calculation being performed with each body we iterate through

And that's about it. With each node, we have one less calculation to perform, so our total number of operations is now:

$\sum_{k = 0}^{n}(n-1)= \frac{n^2-n}{2}$

This is significantly better, in fact:

$\lim_{n\to \infty }\frac{(n-1)^2}{\frac{1}{2}(n^2-n)}=2$

For very large systems, this algorithm approaches to be twice as fast.

I implemented this algorithm in our Wikipedia grapher, and it's working quite nicely. Here's one of the graphs I made with it today(554 nodes, zoomed out):

There are also n log(n) algorithms for these systems, but frankly I don't understand how they work well enough to write about or implement one yet, so I'll call it a night for now and get to those another time.

Tuesday, January 10, 2012

Graphing wikipedia - update

I fixed up the crosslinking on our wiki grapher, so now each expanding node checks the list of existing nodes and connects them rather than creating a new node for every page it finds. A next step would be to use Wikipedias incoming links feature and also check node links in the other direction, however this would probably impact performance as a full web request needs to be made for every single node at creation, not just after it is clicked/expanded. Anyway, this fix already makes the resulting graph a lot more interesting:

By the way, this is centered around Mergesort and outgoing links are limited to 10 for testing purposes. I might keep it that way and then decide on a way to display the 10 (or so) most important links, although it'll be difficult to decide how to judge link importance.

Visualizing Pagelinks

In this blog's last post I talked about force graph balancing and some of the things it can be used for, more specifically visualizing networks such as links between webpages. Robert Brünig and I are currently working on a small projects that does just that, by scraping pages (currently Wikipedia articles) for links and then representing them graphically. It's written in C++ using the Qt framework. This would've been fairly simple in Python, however for larger networks the program becomes very computationally intensive, and we wanted to get good performance. Besides, using C++/Qt allows for much more stable deployment, and Qt has a lot to offer in terms of modules (widgets, graphics, web access and such). Screenshot below.

As you can see, the graph quickly grows quite large for wikipedia pages due to the high number of pagelinks on each article. We're thinking of focusing on Youtube instead, visualzing a gives clips related videos - perhaps we might even be able to embed the videos in the nodes, so that they can be watched directly in the program.

There are also still some issues with the graph balancing algorithm, we still need to find a good combination of attraction/repulsion/weighting parameters that produces a smooth and stable graph that balances quickly.

Pages