Introduction
The fork()
system call in xv6, and as we have seen it so far in class, copies
all of the parent’s address space into the child’s address space. This is okay
if the process’s address space is small, but if the parent is large, then this
becomes very slow. Also, most often, a lot of the pages that are shared between
the parent and the child are never written to. So keeping two copies of the same
page that is only being read is very wasteful. Furthermore, copies of the full
address space might be wasteful since calls to fork()
are often followed by
calls to exec()
in the child, which completely replaces the address space of
the child. In that case, the copy that we did on fork()
was very wasteful.
In this lab, we will implement an optimization to fork()
referred to as
copy-on-write (COW).
Learning Objectives
At the end of this lab, you should be able to:
- Implement cow forking in the xv6 operating system.
- Implement reference counting as a way to track shared pages before freeing them.
- Manipulate process page tables in the xv6 operating system.
Overview
Unlike our regular fork()
, a COW fork()
will defer the allocation and
copying of physical memory frame for the child until the copies are really
needed, if ever.
COW fork()
creates just a page table for the child, which PTEs point to the
parent’s physical frames that are now shared between the parent and the child.
However, COW fork()
marks all PTEs, in both the parent and the child, as
read-only. In other words, neither the parent nor the child can write to
their pages after a fork()
system call.
When either the parent or the child attempt to write to a page, a page fault occurs and the kernel is called upon to handle the exception. The kernel will detect a COW protected page and will then allocate a frame of physical memory for the faulting process, copy the content of the original page to the new one, map the new frame in the faulty process’s page table, and unmaps the old page appropriately. However, when mapped in the page table, the kernel marks the corresponding PTE as writable now, and thus the faulty process can continue execution normally after that.
The COW fork()
makes freeing physical frame trickier. A frame of memory may be
mapped in multiple processes’s page tables, and thus cannot be really free’d
back to the kernel until the last mapping is remove. We will explore this tricky
step in this lab.
Getting the Source Code
To obtain the source code for this lab, follow the instructions below.
1. Commit and push your changes to your current branch
First, make sure all your changes to your current branch are pushed to your
repo. Recall that you can use git branch
to check which branch you are
currently on.
Follow the standard git add
, git commit
, and git push
workflow to push
your changes to your own private repo.
If at any point, you get permission issues, this most likely means that you are
trying to push to the class repo, which you do not have access for. To push to
your own main
branch, you can use:
$ git push origin main
2. Fetch the changes from our repo
From your Linux terminal, fetch our changes using:
$ git fetch upstream
Then, make sure you can see this lab’s branch, namely upstream/cow
.
$ git branch -a
* clab_solution
main
remotes/origin/clab_solution
remotes/origin/main
remotes/upstream/clab
remotes/upstream/heapmm
remotes/upstream/main
remotes/upstream/buddy
remotes/upstream/cow
You might see more branches locally (under origin
) depending on what you have
done, but you should be good if remotes/upstream/cow
shows up.
3. Get the code in a new branch
Next, let’s checkout the cow
branch and create a new local branch for its
solution.
$ git checkout -b cow_solution upstream/cow
branch 'cow_solution' set up to track 'upstream/cow'.
Switched to a new branch 'cow_solution'
4. Push the changes to your repo
Finally, push your changes to your own repo to make sure the code is there and you can start editing.
$ git push --set-upstream origin cow_solution
Enumerating objects: 119, done.
Counting objects: 100% (119/119), done.
Delta compression using up to 56 threads
Compressing objects: 100% (64/64), done.
Writing objects: 100% (111/111), 42.50 KiB | 42.50 MiB/s, done.
Total 111 (delta 42), reused 111 (delta 42), pack-reused 0
remote: Resolving deltas: 100% (42/42), completed with 8 local objects.
remote:
remote: Create a pull request for 'cow_solution' on GitHub by visiting:
remote: https://github.com/user/csse332-labs-noureddi/pull/new/cow_solution
remote:
To github.com:user/csse332-labs-noureddi.git
* [new branch] cow_solution -> cow_solution
branch 'cow_solution' set up to track 'origin/cow_solution'.
Step 0: Making Sure Tests Fail
In this first step, we just want to make sure that your code fails the first
simple test case of our tests. We have provided you with a set of test cases
that tests various fork()
behavior. The first of those tests, called simple
,
allocates more than half of the available physical memory for one process, and
then calls fork()
. This fork will fail because there is not enough space in
the system to create a copy of the parent process, with all of its address
space.
To run this, first compile xv6 using make qemu
, and then run the cowtest
executable using:
$ cowtest
simple: fork() failed
Your output should show exactly the output shown above. Once that is done, you are ready to move on to Step 1.
Step 0.5: Identifying Segmentation Faults
The default behavior of xv6 when it encounters a segmentation fault is that it
will kill the faulty process. What really happens behind the scenes is that a
user trap is encountered when accessing an unmapped page in the page table, and
the code in usertrap
in kernel/trap.c
is triggered.
Specifically, when a page fault happens, the hardware will trigger an exception.
In RISC-V, the hardware will record the cause of the exception in the scause
register, save the faulty PC in sepc
, and the value of the pointer that caused
the page fault in the stval
register. Then, the hardware will trap into the
kernel to handle the page fault, which will then call on usertrap
to decipher
what happens, since this is an exception that happened from user space.
Take a look at the code in kernel/trap.c
, specifically at usertrap()
for a
sample of what happens. You can see for example that if the trap is caused by a
system call issued from the user, the value of the scause
register will be 8
as shown in the following snippet of code:
if(r_scause() == 8) {
// system call
...
}
As it currently stands, xv6 does not have a handler for segmentation faults.
Instead, the kernel will kill the process if it is caused by an unhandled trap.
In order to figure out what happens when a page fault occurs, let’s create one.
In the user directory, we have provided you with a file called user/fault.c
that forces a page fault as follows:
int
main(int argc, char **argv)
{
char *p = (char *)0xdeadbeef;
/* Try to access an invalid pointer and check what happens. */
*p = 0xff;
printf("The ptr %p points to %c", p, *p);
exit(0);
}
To run this program, compile the xv6 kernel using make qemu
and run the
fault
executable as follows:
$ fault
usertrap(): unexpected scause 0x000000000000000f pid=3
sepc=0x0000000000000010 stval=0x00000000deadbeef
You can see that for a page fault, the scause
register contains the value 15
(0xf in hex), and the stval
register contains the address that caused the
fault (0xdeadbeef
in our case). We can therefore modify the source code for
usertrap()
to print a segmentation fault as follows:
else if((which_dev == devintr()) != 0) {
// ok
} else if(r_scause() == 15) {
printf("Segmentation fault from process %d at address %p\n", p->pid, r_stval());
p->killed = 1
} else {
...
}
Now compiling this code using make qemu
and running the fault
executable
results in:
$ fault
Segmentation fault from process 3 at address 0x00000000deadbeef
$
We will return to this handler later on in Step 2.
Step 1: Modifying fork
Behavior
Understanding the Copying fork
Behavior
First, take a look at the implementation of the fork()
system call. You can
find the source code in kernel/proc.c
. The most important snippet of code we
care about is the following:
// Copy user memory from parent to child.
if(uvmcopy(p->pagetable, np->pagetable, p->sz) < 0){
freeproc(np);
release(&np->lock);
return -1;
}
np->sz = p->sz;
This code will copy the parent process (called p
)’s address space and create a
copy of each page in the child process’s address space. uvmcopy
copies all of
the mapped pages in the parent’s address space and creates corresponding pages
in the child’s address space, and maps them in the child’s page table. You can
find the source code for uvmcopy
in kernel/vm.c
.
Specifically, we care about the following loop in uvmcopy
:
for(i = 0; i < sz; i += PGSIZE){
if((pte = walk(old, i, 0)) == 0)
panic("uvmcopy: pte should exist");
if((*pte & PTE_V) == 0)
panic("uvmcopy: page not present");
pa = PTE2PA(*pte);
flags = PTE_FLAGS(*pte);
if((mem = kalloc()) == 0)
goto err;
memmove(mem, (char*)pa, PGSIZE);
if(mappages(new, i, PGSIZE, (uint64)mem, flags) != 0){
kfree(mem);
goto err;
}
}
This code will first walk the parent’s page table for each virtual page number in the parent’s address space. It does some error checking and then allocates a page in the child’s address space using:
if((mem = kalloc()) == 0)
goto err;
and then copies the content of the old page into the new page using:
memmove(mem, (char*)pa, PGSIZE);
Finally, we create a mapping for the new page in the child’s page table using:
if(mappages(new, i, PGSIZE, (uint64)mem, flags) != 0){
kfree(mem);
goto err;
}
It is absolutely crucial that you understand how this function works for you to be able to continue on with this lab, so please ask questions about it before you move on.
Implementation: Mapping the Parent’s Pages Without Copy
Your first task in this lab is to modify the uvmcopy
function to create
non-writable pages that are shared between the parent and the child. In other
words, you will NOT be creating new pages for the child, instead all the
mappings in the parent’s page table will be the same in the child’s page table,
and all the pages in the shared address space will be marked as non-writable.
The goal of making those pages non-writable is to cause segmentation faults to occur when either the parent or the child try to write to these pages. Then, it is only at that point that we create new pages, copy the content of the old pages, and remap those pages in the corresponding process’s page table.
However, this creates a small conundrum. What if a page was originally marked as
non-writable? At that point, we do not want the kernel to copy and remap that
page in the page table, we would like to keep that page non-writable. Therefore,
to achieve a distinction between pages that are non-writable because of
the copy-on-write mechanism, and those that are non-writable by default, we will
make use of an additional bit in each page table entry (PTE). This bit is called
the RSW bit (where RSW stands for Reserved for SoftWare). Therefore, we will set
that bit to 1 whenever a page is marked non-writable because of the cow
mechanism. We have already defined that bit for you in the kernel/riscv.h
definitions as follows:
#define PTE_RSW (1L << 5) // reserved for software
Task: Change uvmcopy
Your task here is to change the behavior of uvmcopy
to share the pages
between the parent process and the child process. In other words, we do not
want to create new pages (or frames) for the child that are copies of each
other, we want the parent and the child to share the same pages, and thus
the same mappings.
However, for all those pages (in both parent and child), we would need to unset
the Write flag (i.e., disable writing to hose pages from both processes).
To unset a flag from a page table entry (say pte
), we would perform a bitwise
operation as follows:
*pte &= ~PTE_W;
Additionally, for those pages that were not originally write-protected, we
would like to set the PTE_RSW
bit. We can do so using *pte |= PTE_RSW;
.
Note that per our discussion above, not all pages should receive this bit.
At the end of this task, the parent’s page table and the child’s page table should be exact copies of each other, with no page marked as writable, and those pages that were not write-protected now having the reserved for software bit.
Testing
At this point, your xv6 kernel will break. Recall that fork()
is an essential
system call in everything the operating system does. And as of now, when we
fork a process, all of the pages for both the parent and child are marked as
non-writable, which will cause our operating system to break.
When I tried to run my kernel using make qemu
, I get something like the
following:
xv6 kernel is booting
hart 1 starting
hart 2 starting
init: starting sh
usertrap(): unexpected scause 0x000000000000000c pid=1
sepc=0x0000000000000394 stval=0x0000000000000394
panic: init exiting
Your output might look slightly different, but basically the init
process
should fail since we cannot fork()
properly. We will take care of that in the
next step.
Note that it is okay if your kernel doesn’t fail out of the box, you might
sometimes need to press <Enter>
before the kernel breaks; that is perfectly
acceptable for this step.
Step 2: Handling Segmentation Faults
It is now time to go back to our page fault handler to make sure that we create new pages and copy then when we encounter a page fault caused by a cow mapping. What we would like to do in this step is to modify the page fault handler that we created in Step 0.5 to handle cow mapping.
Task: Add your custom handler
Your task now is to handle page faults that are due to the copy-on-write mechanism. Before you start writing code, make sure you answer the following question:
What is a page considered to be a cow page?
You need to have a clear answer to the question above before being able to implement this task.
Please note that if you detect a page fault on a non cow page, that you don’t need to do anything else, simply kill the process as any other unhandled page.
For those cow pages, now is the time to do the copying. You will need to copy the page, unmap it from the process’s page table, and then map it again with the copy with appropriate permission flags.
Some hints
-
You will most likely need to walk the process’s page table. You do not have to rewrite this function, you can use the
walk
function call fromkernel/vm.c
. To do so, add the following line at the top of your file, right below theextern char trampoline[] ....
line:extern pte_t *walk(pagetable_t pagetable, uint64 va, int alloc);
-
Look at the functions in
kernel/vm.c
to checkout ways to manipulate the page table (create new mappings, remove existing one). Make sure you understand what a function does exactly before you use it.
Testing
Implementing this step does not completely solve all of our problems, but let’s try to test a few things out so far. The main problem that we have not handled yet is what happens when a process dies, so by definition our tests will fail. We will take care of that in the next step.
However, we have provided you with a partial test that would tell you if your
implementation so far is correct. Checkout the code in user/simplefork.c
. This
is very similar to the simple
test in cowtest.c
, however it does not call
the wait
system call since we haven’t handled yet what happens when a process
dies.
To test things so far, compile the xv6 kernel using make qemu
and run the
simplefork
executable using:
$ simplefork
simple: ok
usertrap(): unexpected scause 0x000000000000000c pid=1
sepc=0x0000000000000fa4 stval=0x0000000000001000
panic: init exiting
Note that the test passes but then the kernel panics. This is due to the fact that we do not clean up after ourselves when doing COW. Let’s take care of that in the next step.
Step 3: Adding Reference Counting
A big problem arises when we share pages between processes, and that is that of cleaning up allocated memory. Under normal circumstances, when a process dies, it will unmapp and free all of its allocated pages. However, under COW, that page that we are freeing might still be shared with a child process, and that is problematic if the child process still needs access to it.
This is a very similar problem to one that might arise when dealing with garbage collection in many high level languages. For each physical frame in our system, we must keep track of a reference count that indicates how many processes are still actively trying to access that frame. As long as the reference count is greater than 0, we must not free that frame. It is only when the reference count reaches 0 (i.e., no further processes need that frame) that we can completely free the frame.
Your job in this step is to implement reference counting for each physical frame
in the xv6 system. You will need to modify the code in kernel/kalloc.c
to keep
a reference count for each allocate frame in the system. The design of xv6 makes
things easy for us since we already know before hand the maximum number of
physical frames that can be allocated in our system. Check out the code in
kernel/memlayout.h
, specifically,
// the kernel expects there to be RAM
// for use by the kernel and user pages
// from physical address 0x80000000 to PHYSTOP.
#define KERNBASE 0x80000000L
#define PHYSTOP (KERNBASE + 128*1024*1024)
The macro PHYSTOP
indicates that largest physical address that we can reach.
The macro KERNBASE
is the address where the kernel’s memory is allocated.
Therefore the total number of frames that can be allocated for the users is
(PHYSTOP - KERNBASE) / PGSIZE
.
Task: Implement reference counting
Your next task is to add reference counting to your cow mechanism implementation. Since the number of frames in xv6 is fixed and will not change, we suggest that you use a simple array to keep track of the reference count for every frame in physical memory.
Whenever a frame is allocated (using kalloc
), its reference count should be
increased. Similary, anytime you share a frame between two or more processes,
the refence count for that frame should be increased.
Contrarily, when a frame is free’d (using kfree
), its reference count must be
decremented. It is only after the reference count for that frame reached 0
that it would be actually free’d and released back to the kernel.
Some hints
-
You will need to modify code
kernel/kalloc.c
. -
You will need to modify your
uvmcopy
function to make use of reference counting.Hint: To expose a function from
kernel/kalloc.c
to your other files, add that function inkernel/kalloc.c
and then use extern to expose it.For example, if I create a function called
void my_function(uint64 pa)
inkernel/kalloc.c
, then if I need it anywhere else in the kernel, you can use:extern void my_function(uint64);
-
To find the frame number for a given physical address, you can use the following:
#define FRINDEX(pa) ((uint64)pa - KERNBASE) / PGSIZE
-
Since xv6 runs on two processors by default, some synchronization must be done. Since we have not talked about that yet, you can turn that off using
make CPUS=1 qemu
.Alternatively, it is a really simple solution, anytime you need to access the reference count array, you must have a lock on your hands. For me, I did the following:
acquire(&kmem.lock); // do stuff with the reference count array. release(&kmem.lock);
Testing
Now we can test things out a bit more. To do so, uncomment the lines in
user/simplefork.c
that are marked with UNCOMMENT THIS FOR STEP 3
(there is
one in the simpletest()
function and one in main
). Then, compile the kernel
using make qemu
and run the simplefork
executable as follows:
$ simplefork
simple: ok
simple: ok
$
Both simple tests should pass at this point. We can also make more extensive
tests using cowtest
, however not all tests will pass. Specifically, simple
and three
will pass, but file
will fail, something that looks like the
following:
$ cowtest
simple: ok
simple: ok
three: ok
three: ok
three: ok
file: eerrorerrorr:r orrea: readd : rfaeafdi laedifalielde
d
$
Note that your error might look a bit different, but it will fail regardless.
We will fix the filetest
in the next and final step.
Step 4: Modifying copyout
The final case we would like to take care of is the case of copyout
in
kernel/vm.c
. copyout
is a function that copies bytes of data from the kernel
address space to the user’s address space. When a process calls copyout
on a
shared cow page, it must first allocate a new page and copy it before it can
write to it. However, it will not go through the same page fault mechanism
that we implemented in the previous steps since copyout
is usually called in
kernel space, and thus has a different trap handler. To simplify things, we
will directly modify copyout
’s source code to handle cow pages.
To complete this lab, modify the source code for copyout
to handle cow pages
before writing to them. Your code will look very much similar to the code you
used in step 2. In a nutshell, the source code of copyout
is copying data
from the kernel space to the user space page by page. For each page to copy,
the code will first find pa0
, which is the address of the physical frame to
write to, and then calls memmove
to copy the data from the kernel frames to
the user’s frame. Your code must reside after the line va0 =
PGROUNDDOWN(dstva);
and it should set the appropriate value for pa0
.
Hint: In the case where the virtual page to copy to is not a cow page, you simply need to set
pa0 = walkaddr(pagetable, va0);
.
Hint: You will need to fail
copyout
if the user runs out of space. In other words, ifva0
is greater thanMAXVA
, then you need to return -1.
Testing
Now, all cowtest
tests should pass:
$ cowtest
simple: ok
simple: ok
three: ok
three: ok
three: ok
file: ok
ALL COW TESTS PASSED
Running the full grading script
Once you are done implementing all the above programs, run the grading script using
$ make grade
from your Linux terminal (not your xv6 terminal window).
Submitting Your Code
From the Linux terminal, issue the command (make sure you are in the xv6-riscv
directory in your repository):
./create_submission.sh <username>
and replace <username>
with your RHIT username (without the <
and >
). For
example, for me, that would be:
./create_submission.sh noureddi
If you get a message saying that you don’t have permission to run
./create_submission.sh
, then issue the following command first
chmod +x ./create_submission.sh
Here’s the output as it shows up on my end:
Cleaning up xv6 directory...
Process started: writing temporaries to /tmp/dab9bfedf8d508c2a1c3f1c95e6ba1fc.txt
Found the following modified files:
./user/rhmalloc.c
Creating the submission zip file.
adding: user/rhmalloc.c (deflated 54%)
Done...
################################################################
submission_noureddi.zip has been created.
Please submit THIS FILE AND THIS FILE ONLY to Gradescope.
################################################################
This will generate a single file called submission-username.zip
(for me, it
would be submission-noureddi.zip
). That is all you need to upload to
Gradescope.
Submission Checklist
- My code compiles and generates the right executables.
- I ran
make grade
to double check the test cases for all of my code. - I ran the submission script to generate my
zip
file. - I submitted the
zip
file to Gradescope.
If you notice any typos or inaccuracies, please open a GitHub issue on this repository.