Introduction
Let’s say that you would like to write a multi-threaded application, but your operating system doesn’t support threads (i.e. it just runs whole processes, which can’t have sub-parts that are scheduled separately). Are you doomed to a single threaded program - no! You can implement your own user-space (as opposed to kernel-space) threads. The process you’ll use is very similar to the way the OS itself handles threads. You will have full control over when these threads get to run, in which order, and when do they get taken off of the CPU.
You can look at the code in basic_threads_in_use_example.c
for an idea about
how our threading library will eventually look like.
Learning Objectives
At then end of this lab, you should be able to:
- Develop a basic multi-threading library built entirely in user space.
- Develop a scheduler that attempts to pick the next thread to run based on the set of available threads.
Getting the Source Code
For this lab, you will be using the native Linux virtual machine (or baremetal machine if you have one) and not the xv6 operation system. Please note that this lab might behave slightly differently if you are running it on Windows or MacOs; therefore, we highly recommend that you stick to using a Linux machine, either natively or via WSL2.
To obtain the starting code for this lab, navigate to the top level directory of
your csse332 class repository and cd
into the labs/lab09
directory as
follows:
$ cd /path/to/csse332/git/repository/
$ git pull
$ cd labs/lab09
The Idea
Recall that what program is currently running in the CPU is entirely determined by a few CPU registers (program counter, current stack pointer, and a few others). We usually adjust these registers, saving and restoring their state, to allow things like function calls to happen. That’s not the only way we can use them though! We can also save them, then switch them for entirely different values - essentially switching our program into a whole new state with a whole new stack. Then at some later time, we can restore the old values and from our code’s perspective it will seem like they never left. If we switch between these two codebases fast enough, the two threads of execution will appear to be running simultaneously.
Why would we want this?
Although the threads will appear to be happening simultaneously, actually they are only running one at a time on one CPU. Given that, we won’t get any performance increase by using our threads to solve an algorithm in parallel. But there are many more uses for threads than just writing parallel algorithms.
The most common use is event-based programming. Imagine you want some code to run when the mouse is clicked, while at the same time doing some tricky calculation. Without threads, your calculation code is going to be polluted in various places by checking the mouse. Or imagine data is periodically coming in from some source (say a network download) and at the same time other aspects of your program are running. You want the download data to be written to a file (say) but that’s not something the rest of you application should be constantly checking and doing. Even if the OS doesn’t support threads, with userspace threads we can easily handle all of this.
Another benefit is that userspace threads tend to be quite a bit more lightweight than kernel or OS-managed threads. So if you want to write code with a LOT of parallelism, usually using a kernel thread for each one uses too much memory and is too slow. The parallel languages Erlang, Elixir, and Go for example, make use of a mix of userspace and kernel threads to allow the creation of way more threads than could normally be supported. Using userspace threads also gives you complete control over scheduling the threads, which means you can implement optimization that the OS can’t know about.
How to Achieve That
So what we need to start with is a way to save the current execution thread running on the CPU into memory, switch to a different execution thread, and then later restore that old process. We could write all that in assembly, but luckily there are some handy unix functions that do that kind of thing.
Function | Description |
---|---|
getcontext |
gets the current state into a ucontext variable, which you can then modify |
makecontext |
modifies a context, making it include a function call of your choice |
swapcontext |
stores the current context in a variable, and switches execution to the given new context |
Of course that’s not very specific, to get more information, you can always look
for the documentation on these functions using the man
utility, something like
the following:
man getcontext
Starting with an Example
Take a look at example1.c
, it provides a good example on how to start using
the getcontext
and makecontext
and swapcontext
functions.
You can compile this code using
$ make example1.bin
When this program runs, you should see the execution switching between the parent and the child threads. Something like:
$ ./example1.bin
Creating child thread
Switching to child thread
child thread running 1 of 10
doing parent stuff and then switching back to child
child thread running 2 of 10
doing parent stuff and then switching back to child
child thread running 3 of 10
doing parent stuff and then switching back to child
child thread running 4 of 10
doing parent stuff and then switching back to child
child thread running 5 of 10
doing parent stuff and then switching back to child
child thread running 6 of 10
doing parent stuff and then switching back to child
child thread running 7 of 10
doing parent stuff and then switching back to child
child thread running 8 of 10
doing parent stuff and then switching back to child
child thread running 9 of 10
doing parent stuff and then switching back to child
child thread running 10 of 10
doing parent stuff and then switching back to child
Child thread exiting
Child done
Child thread returned and stack freed
I strongly recommend that you take the time to understand this code completely before moving on to the next step.
What you Have to Do
Your job in this lab is to turn this basic approach in example1.c
into a
threading library. You would want to implement all of the function outlined in
basic_threads.h
. We have provided you with detailed description about each
function in basic_threads.c
.
Running the Test Cases
The files that you will be editing define a library, and thus have no main
function. However, we have provided you with a bunch of test cases that you can
run depending on which step you are in this lab.
To compile the test cases, you can use make tests.bin
and then run the tests
for steps 1 and 2 (for example) using:
$ ./tests.bin 2
Test 1: Just One Thread
In this first test, we will only consider having a single thread. This pretty
much revolve around making the code that is in example1.c
become part of our
library. It should only be a matter of just copying and pasting the various
parts of example1.c
into the appropriate functions of basic_threads.c
with
slight modifications. Note that to do this, you must really spend some time
understanding what the different parts of example1.c
are doing.
For now, feel free to add the parent
and child
variables from example1.c
as globals. In test 2, we will replace those with references a threads array,
but let’s just get through test 1 first.
To run the test, simply compile and run the tests.bin
binary and run it with
argument 1 as follows: ./tests.bin 1
.
Test 2: A Bunch of Threads
Make sure you have test 1 passing before you move on to this part.
The basic idea we have here is that we would like to have an array of
ucontext_t
variables. As new threads get created, we will slot them into that
array, in the next available slot. After that, the schedule_threads()
function
will switch between all the active threads in that array.
A few things to pay attention to:
-
Just looking at the threads array by itself, it’s going to be impossible to tell if a particular entry has a valid context or just garbage data that hasn’t been initialized yet. To address this issue, you will probably want an array of booleans that indicate if a particular entry in the array contains a valid context or not.
Don’t forget to initalize the array of booleans itself.
-
When a thread yields, it is important to know what the index of that currenlty running thread is. We need that index value to find out which entry in the array the yielding thread occupied so that we can replace the correct context when swapping between threads. An easy way to keep track of this is via using a global index variable.
When you are ready, compile the tests.bin
binary and run it for step 2 using
./tests.bin 2
. Get all the tests passing before moving on to step 3.
Test 3: Reclaiming a Finished Thread’s Slot
Depending on how you implemented your code so far, this test case may work for you out of the box. But here is the gist. There are two main ideas that we want to address here:
- An active thread should be able to create new threads into the system.
- When a particular thread finishes, its corresponding entry in the threads array becomes available for reuse by future threads. Therefore, although our current implementation can have at most 5 active threads at a time, we can actually support much more threads over the lifetime of the programs as threads finish and get replaced by new ones.
Modify your code so you can reclaim the entry of a thread that has completed
execution. Once you are ready, compile the tests.bin
binary and run it on the
third set of tests using ./tests.bin 3
.
Note that so far, we are still not worrying about freeing what we malloced for each thread, we only care about reclaiming the finished thread’s spot in the threads array.
Test 4: Thread Parameters
Up until now, all the functions that we passing as threading functions take no
parameters. This is very unusual; most of the time thread functions need
paramters because we often want to start the same function in parallel multiple
times with different parameters (as we have done multiple times when using
thread functions in pthreads
).
Since we need to support thread functions that accept different types and number
of arguments, our best bet is to rely on C void pointers, just like we do when
using pthreads
. If you need a refresher, take a look at the test cases for a
quick reminder on using void pointers as thread function arguments.
Caveat
Remember the case we mentioned in class about being careful how to pass parameters to threads. We must make sure that the memory location we are passing as a parameter to a thread function is still going to be allocated at the time the data is used.
Here is an exampe to illustrate this. This is a subtle memory corruption bug if you pass your parameters to the threaeding function as follows:
void runs_as_thread()
{
int value;
create_new_parameterized_thread(other_function, &value);
finish_thread();
}
Programming in C is all about cultivating an appropriate amount of paranoia!
To solve this problem, we need to make sure that value
always exists as long
as the other created threads is alive. One solution would be to use something
like a thread join function, similar to pthread_join()
.
void runs_as_thread()
{
int value;
thread_id id = create_new_parameterized_thread(other_function, &value);
join_thread(id);
finish_thread();
}
For simplicity, we are not gonna go through the trouble of implementing something like this in this lab.
Implementation
Your job in this task is to implement create_new_parameterized_thread
. First
start by taking a look at the manpage for makecontext
. You will see that you
can pass an arbitrary number of parameters to the makecontext
function (in
that way, it is similar to C’s printf
function. That is represented by the
...
keyword in the function signature). You just have to make sure that the
third argument is the number of extra parameters you want to pass to
makecontext
.
The code for this step otherwise will be exactly like create_new_thread
.
A few minor notes and wrinkles:
-
The function parameter to
makecontext
is still specified asvoid(*)()
, that is a pointer to a function that take no parameters and returns nothing. That’s because there is now way to say something like “a pointer to a function that takes an arbitrary number of parameters and returns nothing”. Therefore, to make this work, you will have to cast our function parameter to that type. Something like the following:c void(*cast_ptr)() = (void(*)()) fun_ptr;
-
It should irk you that
create_new_thread
andcreate_new_parameterized_thread
have basically the same code with minor differences. Will we allow this code duplication? Hell no! It turns out that fixing this is easy though. We can just makecreate_new_thread
callcreate_new_parameterized_thread
with a null parameter. It might be good practice to take a moment and think why would this be a safe approach? Make sure you fix this slight issue as our graders do not duplicated code.
As usual, compile your tests.bin
binary and run it using ./run_tests.bin 4
and make sure you pass all the tests.
Test 5: Removing the Need for finish_thread
.
If you’ve experimented with writing your own test thread functions, you may have
noticed how super-bad news it is if you write a threaded function that doesn’t
call finish_thread
when it returns. Your program instantly and errorlessly
terminates, and even judicious use of a debugger can’t identify the problem
(because this is considered a “natural” exit, not an error).
If you haven’t seen this, try running Test 5 without implementing any code so you see what that looks like.
We could make the error more obvious, but rather than that, it would be better
if the thread function returning just called finish_thread
implicitly. The way
to do this is to add a new helper function that makecontext
calls (instead
of the actual thread function). This function will take 2 parameters, the
actual thread function pointer and the void pointer parameter to pass it.
Then the helper function will call the actual thread function, and once it
returns, call finish_thread
. If we wanted to, we could also add some
initialization that occurs before the function call - not needed quite yet, but
it will be quite handy once we have preemption in a future assignment.
As usual, compile your tests.bin
binary and run it using ./run_tests.bin 5
and make sure you pass all the tests.
Test 6: Freeing What We Have Malloced
Finally, we have to take care of freeing everything that we have malloced for
each thread once it finishes. Your first instinct might be that the helper
function we implemented in the previous step would be a great place to call
free
to free the thread stack allocation. This is a terrible idea! (stop
for a second and see if you can figure out why it is a terrible idea without
reading my solution)
This is particularly bad news because most of the time this code will probably work, because the stack remains in use for such a short time the OS will probably not repurpose its page. But then 1 out of 1000 runs, you’ll spontaneously get a segmentation fault - good luck tracking that down. Remember that an appropriate amount of paranoia is what is necessary.
Fixing the Issue
The free
call we need cannot really happen while we are running the thread,
thus calling it from finish_thread
or the helper function is a not a good
idea. Therefore, the only place we can actually call free
is the scheduler.
Once a thread is finished, we need a way to tell our scheduler that that thread
is done and that we can free its stack. Your job in this step is to implement a
way to inform the scheduler of when it must free a thread’s stack.
Testing Your Code
To accurately test your implementation, we need to make that every byte of
memory that you malloc must be freed. One impressive tool that can help us test
that if valgrind
. It is real easy to use.
First, make sure ou have valgrind
installed. If not, then you can install
using your favorite Linux package manager. Assuming you’re on Ubuntu, that would
look something like sudo apt install -y valgrind
or maybe using snap
.
Second, you can run valgrind
on our final set of test cases. As usual, compile
your code to generated the updated tests.bin
binary and then run that binary
in valgrind
as follows:
$ valgrin ./tests.bin 6
==17524== Memcheck, a memory error detector
==17524== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17524== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==17524== Command: ./memtry 6
==17524==
==17524== Warning: client switching stacks? SP change: 0x1ffefffc78 --> 0x5202818
==17524== to suppress, use: --max-stackframe=137336181856 or greater
==17524== Warning: client switching stacks? SP change: 0x52027c8 --> 0x1ffefffc80
==17524== to suppress, use: --max-stackframe=137336181944 or greater
==17524== Warning: client switching stacks? SP change: 0x1ffefffc78 --> 0x5212858
==17524== to suppress, use: --max-stackframe=137336116256 or greater
==17524== further instances of this message will not be shown.
........
OK (8 tests)
==17524==
==17524== HEAP SUMMARY:
==17524== in use at exit: 0 bytes in 0 blocks
==17524== total heap usage: 54 allocs, 54 frees, 2,238,125 bytes allocated
==17524==
==17524== All heap blocks were freed -- no leaks are possible
==17524==
==17524== For counts of detected and suppressed errors, rerun with: -v
==17524== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
If your code is correct, you should see something like the output above. The warning you see in there about switching stacks are okay since we are indeed switching stacks.
Possible Buggy Outcome: Memory Leaks
If your code does not free what it mallocs (i.e., you have memory leaks), then your output might look something like the following:
valgrind ./run_tests.bin 6
==17632== Memcheck, a memory error detector
==17632== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17632== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==17632== Command: ./memtry 6
==17632==
doing a malloc
doing a malloc
==17632== Warning: client switching stacks? SP change: 0x1ffefffc78 --> 0x5202c58
==17632== to suppress, use: --max-stackframe=137336180768 or greater
==17632== Warning: client switching stacks? SP change: 0x5202c08 --> 0x1ffefffc80
==17632== to suppress, use: --max-stackframe=137336180856 or greater
==17632== Warning: client switching stacks? SP change: 0x1ffefffc78 --> 0x5212c98
==17632== to suppress, use: --max-stackframe=137336115168 or greater
==17632== further instances of this message will not be shown.
........
OK (8 tests)
==17632==
==17632== HEAP SUMMARY:
==17632== in use at exit: 2,228,224 bytes in 34 blocks
==17632== total heap usage: 54 allocs, 20 frees, 2,238,125 bytes allocated
==17632==
==17632== LEAK SUMMARY:
==17632== definitely lost: 1,835,008 bytes in 28 blocks
==17632== indirectly lost: 65,536 bytes in 1 blocks
==17632== possibly lost: 131,072 bytes in 2 blocks
==17632== still reachable: 196,608 bytes in 3 blocks
==17632== suppressed: 0 bytes in 0 blocks
==17632== Rerun with --leak-check=full to see details of leaked memory
==17632==
==17632== For counts of detected and suppressed errors, rerun with: -v
==17632== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Notice that the output above is telling that we have leaked over 1 MB in our 8 tests, that is far from ideal!
Possible Buggy Outcome: Illegal Memory Access
Another type of memory bug that might occur is that of an illegal access to a
(freed) memory page. In that case, you valgrind
output might look something
like the following:
$ valgrind ./run_tests.bin 6
==17746== Memcheck, a memory error detector
==17746== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==17746== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==17746== Command: ./memtry 6
==17746==
==17746== Warning: client switching stacks? SP change: 0x1ffefffc78 --> 0x5202818
==17746== to suppress, use: --max-stackframe=137336181856 or greater
==17746== Warning: client switching stacks? SP change: 0x52027c8 --> 0x1ffefffc80
==17746== to suppress, use: --max-stackframe=137336181944 or greater
==17746== Warning: client switching stacks? SP change: 0x1ffefffc78 --> 0x5212858
==17746== to suppress, use: --max-stackframe=137336116256 or greater
==17746== further instances of this message will not be shown.
==17746== Invalid write of size 8
==17746== at 0x4C2E10B: free (vg_replace_malloc.c:530)
==17746== by 0x10901D: finish_thread (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x108D75: thread_run_helper (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x4E8307F: ??? (in /usr/lib/libc-2.26.so)
==17746== Address 0x52027a8 is 65,400 bytes inside a block of size 65,536 free'd
==17746== at 0x4C2E10B: free (vg_replace_malloc.c:530)
==17746== by 0x10901D: finish_thread (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x108D75: thread_run_helper (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x4E8307F: ??? (in /usr/lib/libc-2.26.so)
==17746== Block was alloc'd at
==17746== at 0x4C2CEDF: malloc (vg_replace_malloc.c:299)
==17746== by 0x108E52: create_new_parameterized_thread (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x108D95: create_new_thread (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x10960C: test_5 (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x109DFB: CuTestRun (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x10A53B: CuSuiteRun (in /home/hewner/Private/play/threading/memtry)
==17746== by 0x109828: main (in /home/hewner/Private/play/threading/memtry)
==17746==
***MANY MANY MORE ERRORS OMITTED HERE***
........
OK (8 tests)
==17746==
==17746== HEAP SUMMARY:
==17746== in use at exit: 0 bytes in 0 blocks
==17746== total heap usage: 54 allocs, 54 frees, 2,238,125 bytes allocated
==17746==
==17746== All heap blocks were freed -- no leaks are possible
==17746==
==17746== For counts of detected and suppressed errors, rerun with: -v
==17746== ERROR SUMMARY: 136 errors from 36 contexts (suppressed: 0 from 0)
As you can see above, even though the tests seem to run fine, there were actually plenty of illegal memory accesses here.
Submitting Your Code
Submit all assignment source files (c files, h files) but not binaries (example1, run_tests, .o files) via Gradescope.
Submission Checklist
- My code compiles and generates the right executables.
- I ran all of the test cases and all the tests pass.
- I ran
valgrind
on my code and no memory bugs were detected. - I submitted all my source files to Gradescope for grading.
Grading Rubric
Part | Point Value |
---|---|
Test 1: Scheduling a single thread once | 40 |
Test 2(a): Scheduling 2 threads that each run the same code | 16 |
Test 2(b): Scheduling 3 threads where 2 run the same code and 1 runs different code | 12 |
Test 2(interleave): Scheduling 2 threads that interleave their execution | 12 |
Test 3: Scheduling a thread that creates many quick threads | 40 |
Test 4: Scheduling threads that each runs a function that takes a parameter | 20 |
Test 5: Scheduling threads that run functions that don’t explicitly call finish_thread | 20 |
Test 6: Running Test 5 with valgrind to look for memory leaks and other errors | 40 |