Before I start this post, let me first explain what MCSJ actually is. MCSJ stands for ‘My Computer Science Journey’, and basically, I am planning to write a blog post once in a while (hopefully at least once a month) about what I learned in my CS classes. I feel that this will help me reaffirm my understanding of the topics, and also, a lot of people have been telling me about how important it is to have an ‘online portfolio’. I’m pretty sure writing a blog post about CS isn’t exactly what they mean by an online portfolio, but I thought this would be a good idea to start with. So, here goes nothing!
Last month, I learnt a lot about assembly code in my Systems class. I have to say; it was really mind-boggling at first. It seemed totally confusing with all the registers and different operations.
Assignment 4: Buffer
As the title suggests, this assignment involved a lot of buffer overflow attacks. Obviously, the point of this assignment was not to encourage us to start using exploit strings to attack programs, but it was more to develop an understanding of stack organization and the x86-64 instruction set.
There were 5 phases to this assignment, with increasing difficulty. At the core of it all, here is the main part of the program:
Basically, Gets() reads a string from standard input, and stores it into buf. However, there’s a problem here. Gets() has no idea how big or how small buf is, and just blindly copies the entire string into buf. If the string happens to be longer than the allocated size, which in this case is 32 bytes, well then you’re basically screwed. The string might just overwrite whatever it is that’s in the stack, and cause the program to go wrong. This is what we were supposed to do though. We were supposed to compose some exploit string to input into buf, and thus, causing the program to do things that it’s not supposed to do.
In total, there were 5 phases to the assignment. Here are a couple of essential things I learnt from this assignment:
#1: Organization of the stack frame
I think the most important thing I learnt from this assignment is the organization of the stack frame. The first phase of the assignment is called ‘Goomba’. When getbuf() is called, instead of returning back to wherever it is supposed to return, we are supposed to feed an exploit string so that instead, it calls the function goomba(). In order to do this, we must first understand how the program stack works. In this particular assignment, there is a function called test() that will call getbuf(). So under the stack frame for test(), there will be the return address, and underneath that, there will be the stack frame for getbuf(). Since getbuf() allocates 32 bytes for buf, we can visualize it as 4 lines in the stack frame, where each line consists of 8 bytes (hence, 8 x 4 = 32 bytes). I gathered that in order for getbuf() to call goomba() instead of returning to test(), we need to overwrite the return address with the return address of goomba(). Thus, after Gets() reads from standard input, it will then go to the return address, which will be Goomba! Hence, problem solved.
#2: In times of doubt, check the disassembled version of your code
In the case where argument(s) are supplied to a function, the arguments will also be stored in the stack frame. The second phase, called ‘Scoreboard’ is similar to the first phase, but this time, we also need to change the arguments for scoreboard(). Here’s some code for more information.
The tricky part with this level is knowing how to change the argument for scoreboard(). I figured I had to overwrite the location of rs somehow, but I was clueless at first. Then, I looked at the disassembled version of buffer (to disassemble the program, you can type: objdump –d buffer > obj.txt and it will save a disassembled version of buffer into obj.txt). Through the disassembled code, I found out the location of points_per_race, and then overwrote those locations with an arbitrary number (I figured it should work as long as it’s bigger than 9). Voila! Problem solved. But really though, understanding the disassembled version of the code really helps. It gives you more insight as to how the program really works, because sometimes, the program doesn’t run exactly the same was as you thought it would (ahemm, code optimization, I’m talking to you).
#3: GDB is your friend
The fourth phase of the assignment is called ‘Bullet Bill’. Long story short, we basically needed to change the value of a global variable. Here is the code for reference:
At first, I was really confused as to how I could accomplish this task, but it turns out to be easier than I thought. Global variables are unique, in a sense that their address will always be the same. Therefore, we can just use GDB to find the address of first_place, and move the value of our cookie to that address. Thanks GDB! Obviously, there are a lot of other uses for GDB besides finding the addresses of global variables, and with that being said, this is just a glimpse of GDB can actually do. I definitely used GDB a couple of times to run through this assignment and to see exactly what the program does. Looking at disassembled code can get a little confusing with all the weird instructions, so I would usually run through the program with GDB too, to get an even deeper understanding of the program. In completing this assignment, I learned a new GDB command: stepi, otherwise known as si. Instead of executing one C statement, this executes one machine command, such as movq or addq and etcetera.
To be honest, this assignment turned out to be easier than I thought it would be. I didn’t completely understand what the professor was talking about in class, and the whole concept of Assembly code seemed really foreign to me. However, personally, I thought it turned out to be a really fun an interesting assignment. Buffer definitely has to be one of my favorite assignments so far.
If you would like to see the assignment handout, please head over to the CS033 course page.