Lab 4: Stack Smashing
In this lab, we will leverage the knowledge of the stack and procedure calls to hack programs to make them do unexpected things.
Before starting the lab, print out a copy of the Question Sheet. You need to answer questions on this sheet as you go through the lab!
Part 1: Hacking via Pointer
- Login to your Linux, and pull your Git repo.
cd
to the lab4 directory. - Examine
part1.c
using vim. - To compile the code, use
make part1
. To run, use./part1
. To debug, usegdb ./part1
.
Invoke the surprise
function without calling it
You may notice that the function surprise
is never explicitly called in the
program. Our goal in this part is to manipulate the stack space and make the
program invoke the function without calling it.
How to do it?
In short, we can modify the return address of the add
function stored on
the stack to be the function address of the surprise
function. As a result, when the
add
function finishes running and tries to go back to the main
function by
fetching its return address on the stack, the modified return address will
actually direct it to run the surprise
function.
We guide you to complete this implementation with the questions in the Question Sheet in Part1. First, let's run the code in GDB and set a series of break points in GDB to observe some key information:
- Right before calling the
add
function inmain
: Line 25 - At the beginning of the
add
function: Line 12 - After declaring and initializing
int* xp
inadd
: Line 16
Follow the questions/instructions in the Question Sheet to finish the rest of Part 1.
Part 2: Hacking via Buffer Overflow
Hacking by modifying the source code, as we did in Part 1, is not real hacking. But Part 1 does show how you can modify the return address of a function to do unexpected things. In this part, we will apply the same hacking technique (stack smashing) but with a more realistic approach, i.e., without changing the source code.
First, go to your lab4
folder and open part2.c
via vim or emacs.
Notice that main
calls doIt
which calls copyIntoBuffer
.
Also, note that copyIntoBuffer
copies the input argument string into
a buffer (array) located in the stack frame of doIt
.
Consider the ret
instruction at the end of main
. This instruction will
normally branch back to the appropriate place in the code of the kernel (the
operating system program that controls the computer, and called main
).
How is this done? The kernel called a bl main
instruction which
branches to main
and also places the return address in the lr
register.
However, main
must overwrite the lr
register, as it needs
to call doIt
with bl doIt
. Thus, the return address back to the kernel
must be backed-up to main
's stack frame, and then restored to lr = x30
before main's ret
instruction.
Idea: We will smash the stack frame of the main
function by
overwriting the backed-up return address to the kernel, replacing it
with the function address of holyGrail
.
To do this, we will overflow the array in doIt
's stack frame
by inputting more data than it can hold.
More specifically, we will overflow enough to overwrite the backed-up return
address in main
, carefully (and maliciously)
replacing it with the address of holyGrail
.
Then after finishing the main
function, instead of going back to the kernel,
the program will execute the function holyGrail
.
How: As you may notice, the source content that the function
copyIntoBuffer
will copy from is actually from the command-line input, i.e.,
something we type in. So all we need to do is to give it the "sneaky"
command-line input.
The Part 2 executable takes an input argument string. For example, to run the
program with program input ABCD
, you can type:
./part2 ABCD
To debug the part2 program with input ABCD
in GDB, first type gdb ./part2
(with no argument). Inside GDB, you can run either
start ABCD
or run ABCD
.
Notice that by default, the input argument is interpreted as ASCII text. To provide raw bytes in the argument input, we can do the following:
Assuming we want to give four bytes 0x41424344
as the input and also run the
code in GDB, we can type:
./part2 $'\x41\x42\x43\x44'
Notice that the input must start with $
and needs to be wrapped with ' '
.
Each byte is represent by \x
plus two hexadecimal numbers. To run this command, it is recommended to type in your terminal manually rather than copy it from the webpage as the '
can be malformated when copying.
Given the skills and knowledge from the previous part, we have prepared you to practice your hacking skills. We will give less handholding and leave more room for you.
To answer the questions of Part2 in the Question Sheet you need to find proper places to set break points and figure out the answers from GDB.
Finishing the lab
- Scan and upload the signed Question Sheet to Gradescope.