Lab1: Assembler I
Objectives
This is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the lab.
Following completion of this lab you should be able to:
- Assemble RISC-V R, I, and S types
- Understand the relationship between immediates and the fields of an instruction
- Explain why the size of fields in instructions is important
Guidelines
- This lab should be completed by each student, but you will work in collaboration with a partner assigned to you by your instructor.
- Read the lab instructions completely before beginning.
- Don't hesitate to ask for help.
- Write both your name and your partner's name on the lab question sheet.
- You will upload your question sheet and code to gradescope and github upon completion.
- Make sure both lab partners have a copy of the code in their github repo.
Your Tasks
1 Install Python
Install Python 3.x, if you already have Python installed from another class or project you should be good to go. Otherwise, you can download an installer here.
2 Install VSCode
-
We'll be using VSCode as the IDE for this class, you can download and install it from here. (Read the next step before you click that link.)
-
During install you may be prompted for what languages you want the IDE to support. If so, select Python and it should install the basic Python extensions.
If that does not happen you will need to manually install extensions. Once VSCode opens select the Extensions button on left (it looks like 4 squares with one of them detached). In the search bar type "python", select and install the Python extension from Microsoft and the Python Debugger extension from Microsoft
-
Next, in the search bar at the top of the VSCode window type
>terminal
and select "Python: Create Terminal" It will open a terminal in the lower part of the screen. -
In that terminal type
python
to launch python and make sure it works. Check the version number to ensure you're running python 3 or newer, you may need to specify a version of python (e.g. typepython3.12
) to get the right version running in the terminal. Typeexit()
to close the python interpreter. -
In the terminal (outside of python) run the following command:
python -m pip install gradescope-utils
Replace the
python
with the appropriate version/path for the python interpreter you want to use. This installs a testing library that we'll use for the test cases.
3 Get your Assembler
repository
To get your git repo, you'll need the git tools installed. If you are using Windows and need to install git, see the git scm website. If you use a different OS, you should use the install method best for your operating system.
-
We'll be using Github For Education to manage repositories for this course. Read all of these instructions before you begin.
- You'll need to create a github account, your account name should be recognizable as your RHIT username.
- Use the pattern: "rhit-username", for example Robert's account name would be: "rhit-williarj".
- Even if you already have your own personal Github account please make a course-work specific account. We don't want to have to figure out who XxXSuperLuigi2022XxX is when we're grading. If we can't figure out who you are from the username we'll assume you didn't submit the assignment.
- You should set up SSH keys for github access. Scroll to the bottom of this page and follow the directions in "Fixing GitHub Authentication Issues". You can read details about that here.
- Go to this url to create your repository.
- You may be prompted to grant Github Education access to your account, grant access if asked.
- Select your Rose username from the list of students when you create your account.
- If your username is not listed select the "Skip to next step" link above the list of names. Tell the instructor you did this!
- Select "Accept this assignment" on the following page. This will create your repository copy the link on the nextpage it will look something like this: https://github.com/rhit-csse232/csse232-2425a-assembler-joestudent
- you can click your custom link to view your repository on github.com. The github.com web page will display a green "Code" button with instructions on how to clone your repo.
- You'll need to create a github account, your account name should be recognizable as your RHIT username.
-
Check out a copy of your repository.
- Open a terminal window. For Windows, the program
Git Bash
will serve as your terminal. Right click in any folder and selectGit Bash Here
to start up the terminal. - Navigate to where you want your repo. For example:
cd Desktop
- Use this command to get your repo, adjusting for your url from above.
git clone git@github.com:rhit-csse232/rhit-csse232-2425a-assembler-joestudent.git
- Open a terminal window. For Windows, the program
4 Running code and tests in VSCode
Running code and editing arguments
-
To run your assembler you need to set up the debugger. Your repo should contain a
.vscode
folder that has alaunch.json
file inside it. This file specifies the arguments to give the program when it runs. -
Select the
Run and Debug
button on the left side of the screen (it looks like a play button with a bug on top of it). Then near the top you should see a green play button with "Python Debugger: Current File with Arguments". If you have your chosen python file open when you press this button it will start running. -
You can change the arguments given to your assembler by opening
launch.json
and editing the "args" list.- Try adding "--help" as an item to this list when you run the
assembler.py
file below to see its effect.
You can add and edit this list to change the behavior of the program. This is really just for debugging though, if you leave it with the options provided you can simply edit the contents of
test.asm
to test different instructions for this assembler. By default it will output the assembled machine code into the fileout.txt
which will appear in the folder. - Try adding "--help" as an item to this list when you run the
Debugging in VSCode
- You should get familiar with and use the VSCode debugger. We will show you some tips in the
R-type
implementation part of this lab, but generally:- You can set breakpoints by clicking to the left of the line number in a file. (1)
- Use the buttons that appear at the top of the window to navigate through the code as you're debugging. (2)
- The pane on the left will show you variables in the code as you debug. You can also hover over variables in your code to quickly peek at their values. (3)
Running Unit Tests
-
We're using the Python
unittest
library for our test cases. Each test is preceded by a decorator that looks like:@weight(N)
where N is the number of points the test is worth during grading. -
The provided
settings.json
file should have the testing framework ready for you. Click the flask icon on the left to open the Testing window. (Note, sometimes I have to click or open a file before the button will appear.) -
The window will list all the testing files and individual tests. It will look something like this:
TODO: Fix these images sizes
When you hover over an individual test, or header for a group of tests, three buttons appear to the right of the name. The first one lets you run the test(s).
-
The second button is the debugger, you can put breakpoints in the tests or your code to use the debugger while the test cases are running to help you find what causes tests to fail.
-
You should run the tests as soon as you get your repo to make sure that everything is hooked up correctly. You should see output that looks something like this:
Each test is listed and then the status is printed. Yours will say "ERROR" instead of "ok" to start with.
However, VSCode sometimes defaults to a different python testing framework. If the output mentions
pytest
you should close VSCode and reopen it and it will likely use the correct tests. The tests will still give output, the incorrect output will look something like this:Notice the
pytest-8.2.2
under the first header, and the fact that the individual tests are not mentioned, just the test file name.
Reading test errors
When a test fails it can be very intimidating, here are the two main types of failures you'll see. First, when your code translates an instruction incorrectly you'll see something like this:
This error box is pointing to the test that failed (test_R_types_add
). In the box it shows the expected and actual result, just left of the orange circle number 1. The top binary string is always the expected (e.g. "right") answer, the bottom is the result from calling your assembler.
Each test group runs several inputs through your assembler, to figure out which exact input caused the failure you need to look at the traceback. Look for the item that refers to the test name that failed. Here it is just left of orange circle number 2. We can see that the input at line 34 caused the failure, so I can go look at it.
Next I would set a breakpoint at that line and press the test debugger button to trace the code.
The second kind of error you'll see will occur when your code does not raise an exception when it should. Here is an example:
This box similarly points to the test that failed (test_R_types_arguments
). At the top it tells us an exception was not raised when the test expected one, here at orange circle 1 we can see that the test was expecting a BadArguments
exception but did not get one. At orange circle 2 we can see the line numer of the offending input so we can debug the problem.
This lab provides several different kinds of exceptions for different cases. Some of them may be ambiguous, e.g. when is it a BadArgument vs a BadImmediate. Go with your gut and match the test cases later. The test cases are set up to minimize the number of places in the code you need to check for errors and raise exceptions, rather than detecting errors as soon as possible.
Build an Assembler
You've been given a partial implementation of a 32-bit RISC-V assembler.
Your job for this lab is to implement the parts of the assembler needed to make R-, I-, and S- types work.
You only need to edit methods in the lab that are listed below and that have a line like:
TODO: Lab 1
in their body. You can ignore the Lab 2 TODOs for now, and do not make changes to any other
methods or classes. You are free to add your own new helper methods and classes as you see fit.
Before you start writing code you should look at the lab worksheet and read the Grading Rubric section below.
0 Tips and Hints
You should open up assembler.py
and review the general code structure (we'll walk you through it in a bit more detail in the next section). You can look through the docs/assembler.html
file in your repo for an easy to read list of functions and classes. I recommend you keep this open so you can look at any helper methods at a glance.
Some helper methods are implemented for you which you are free to use, these are not tested so you are free to change them as you see fit.
Python implements a few ways to convert integers between bases. First, the int()
method
takes a second argument that defines the base:
int("101", 2) -> 5
int("11", 2) -> 3
int("101", 16) -> 257
int("11", 16) -> 17
Additionally the bin()
method converts a decimal integer into a binary string:
bin(5) -> '0b101'
bin(3) -> '0b11'
bin(-3) -> '-0b11'
Note that it does not use twos-compliment for negative numbers and the strings always
start with a '0b'
.
1 A Tour of the Assembler
This file is BIG and has a bunch of code. This section will take you on a little tour of the code, explaining the overall structure of the assembler. Don't forget that you can open up an html version of the documentation for this file by opening docs/assembler.html
in your Lab repo.
Near the top of the file is the assembler_asm()
method, this method does all the heavy lifting to take in a text file of assembly instructions and break it down into binary one instruction at a time. It does this in 4 big passes, 1) remove comments, 2) process pseudoinstructions, 3) process labels, and 4) translate individualte lines into machine code. For this lab you only need to worry about step 4.
Below this method is a section where each "pass" of the assembler is broken up into helper methods, you do not need to worry about these for now.
The next section (starting at around line 100) is where the core assembler methods are defined. The heart of the assembler is the Assemble()
method. This takes a single instruction and returns its binary representation. You'll be writing this method later, it will do most of its work by calling the Assemble_*
helper methods below.
Each type of instruction in RISC-V has at least one helper (e.g. Assembler_I_Type
). Each of these methods will process one instruction type and return the binary representation of a given instruction. These methods are where you will write most of the code for this lab.
The next section (starting at around like 355) has some helper methods. Some of these are implemented for you and some you will need to implement in Lab 2. You should look at these methods and consider when you can use them. Some of these helpers also show you how to manipulate strings in python in helpful ways.
Below this is the output()
method, which is used to output the final result to a file. You do not need to edit this code.
Next is the "Utilities" section, where the different types of instructions and fields are defined. Here you'll find a FieldData
class, which is a simple struct to hold the data for different instructions for our use later. There are several other helpers defined in this section when you should look over and make use of.
Finally below this are the definitions for the different custom exception types that are set up for the assembler. Not all of these are tested, you should use them as you se fit to make debugging easier for you.
2 Assembling R-type
instructions
First we'll implement assembling R-type instructions and see how to use the test bench to find errors in our code.
-
Open up the Testing window (the flask icon), expand the
Base_assembler_test.py
and then theTestRType
groups. Hover over thetest_R_types_add
test until you see the run button on the right, and click it. You just ran a single set of tests for theadd
instruction, the test should fail the traceback will indicate aNotImplementedError
was raised. -
Go to the
Assemble_R_Type
method in the assembler file, replace theraise NotImplementedError
with this:return "0000 0000 0000 0000 0000 0000 0000 0000"
-
We made this method just will turn all R-types into all 0s in binary, this isn't correct, but lets re-run the tests and see how it looks now. Re run the
test_R_types_add
tests and look at the output. You will see anAssertionError
showing the expected output and the recieved output of all 0s just below it. -
Okay, now we can get down to business, lets start by figuring out which R-type we are translating. There is a helper function for just that, add this code to the method:
field_data = inst_to_fields[cmd]
You should look at the docs for details on this dictionary and the objects in it. But it returns a small object that gives us all the binary for the opcode, funct3, and funct7 fields.
-
Next lets get the register info out of the instructions, we can do that using another helper and the
operands
list:rd = get_register_bin(operands[1]) rs1 = get_register_bin(operands[1]) rs2 = get_register_bin(operands[2])
-
Finally we need to put all these pieces together, you can do this however you want, but there is a helper built in to make this a bit easier. I start by combining all the parts of the isntruction into a list and then calling the helper:
inst_field_list = [field_data.func7,
rs2,
rs1,
field_data.func3,
rd,
field_data.opcode]
return join_inst_fields_bin(inst_field_list)
- The code above has an error, we're going to run the tests and see how to track down the problems. Re-run the R-type tests and you should see something like this:
Note that the part of the expected and recived result is "underlined" with -
and +
symbols, if we look at that binary it seems like there is a problem with rd
we can go fix it now, but instead we'll use the debugger to help us find the exact location of the error.
Put a breakpoint on line 34 of the base_assembler_test.py
file, where the first test is called. Then hover over the test_R_types_add
entry in the test navigator on the left, and press the Debug test
button, (it looks like a play button with a bug on it). That will open up the debugger and pause it on the line where we put the breakpoint. Press the step into
button in the little control panel at the top of your screen until the code reaches the assembler.py
file (it should take 7 step into
clicks). At this point you should see the variable explorer on the left, something like this:
Here we see the instruction add
is passed in as the cmd
argument, and the operands are all x1
. Step through the code until field_data
, rd
, rs1
, and rs2
, have all been assigned. You may want to use the step over
button to not go into the helper method code.
Now the variable explorer will show the values of each register ID field, they are all set to '00001'
which seems right to me! Continue to step through this code until you are taken back to the test file. This test passes, so the error wasn't here (which we should have notices if we read the error message earlier closer, notice it says it failed on like 38), lets continue stepping through the next test for "add t0, s0, sp"
. Step through to the point all the register fields are assigned in the method, and look at the value of rd
is it right? You should see this in the variables pane:
Both rd
and rs1
have the same value, but they are different registers in the operands list! The correct value for t0
is "00101"
so there must be something wrong with rd
. Fix the call to get_register_bin
for rd
so that it uses the correct operand.
-
Re-run the add tests now and you should see they work. And because of the magic of dictionaries if you run the other R-types they should nearly all pass! You can run the full group by pressing the run button while hovering over the
TestRType
group. Because the only difference between R-types is the opcode, funct3, and funct7 fields theinst_to_fields
dictionary already pulls the right data for each type for us. -
One test should have failed, the
test_R_types_arguments
test. It checks that errors happen when the instruction is misformatted, e.g. with too many operands or the wrong type of operands. Look at the test and you'll see that the first test is checking what happens when 4 operands are passed in, the tests are expecting aBadOperands
exception to be raised. Back in the assembler, lets check for this error right at the start of the R-type method, add this code:
if(len(operands) != 3):
raise BadOperands("Incorrect number of operands found in R Type on line %s with args:\n\t%s %s\n" % (line_num, cmd, operands))
You can edit the message in the BadOperands
exception above to have more or less info, whatever will be helpful for you during debugging. Re-run the tests and all the R-type tests should pass now!
-
There are a few other edge cases you should consider even though the tests here dont check for them. What happens if this method is called with a bad instruction name? (e.g.
Assemble_R_Type('rob', 't0, t1, t2', 0))
, you should probably raise aBadInstruction
exception if this happens. What if the translated number of bits is wrong? Or the number of fields is wrong after translation? You may want to check for these errors and raiseBadFormat
orBadField
errors. The tests wont check for this, but they will help you while you are debugging, so you may want to add logic to check for these now. -
You should now follow a similar procedure to implement the other instruction types for this lab.
1 Assemble()
The heart of the assembler is the Assemble()
method. This takes a single instruction and returns its binary representation. At its core though, this method simply figures out the type of the instruction and then calls the appropriate helper function (which you will will write below). Try and make this function do as little work as possible, it will make your code easier to edit in the future.
You should not try and write this function all in one go. I recommend you add to it as you expand your assembler to support more types, just add support for one instruction type at a time.
To start First in Assemble
we need to call the appropriate helper function based on the type, for now we only have R-type instructions, so lets just call that one directly. The helper functions take 3 arguments the instruction name, the list of operands, and the line number. So, we have to do a little bit of work to split the instruction up. Put this code into the body of the Assemble
method, replacing the raise NotImplementedError
line:
split_inst = inst.strip().replace(",", " ").split()
cmd = split_inst[0]
args = split_inst[1:]
result = Assemble_R_Type(cmd, args, line_num)
return result
This breaks the instruction up and then calls the R-Type helper. If the method is invoked like this: Assemble("add t0, t0, t0")
then Assemble
would call Assemble_R_Type("add", ["t0", "t0", "t0"])
and return the result.
Open up the file called test.asm
in the repo, this file has two test R-types for assembling. If you run the assembler.py
file now it should try and translate the code fromtest.asm
code into binary and put it into a file called out.txt
. Run the code by going to the debug pannel and pressing the run button (make sure you have selected Python Debugger: current file with arguments
) then take a look at the output. Try adding a new R-type instruction to the test.asm file and seeing how the output changes. Make sure you tab back into the assembler.py
file before you run, otherwise VSCode will try and run test.asm
as a python file!
As you are working you can add instructions and comments into test.asm
for your debugging purposes.
You should add to the Assemble()
method each time you implement one of the new types (or sub types) of instructions in the next section. Do not try and implement it all at once.
A few tips as you work on these:
-
You may find this helper method useful:
is_core_inst()
-
The
line_num
argument is only used for debugging for this lab. -
Raise exceptions when you encounter instructions that cannot be assembled.
There are several example exceptions raised throughout the template, take a look at dec_to_bin
for a couple examples (and consider why these are raised here).
This program provides a suite of custom exceptions: BadImmediate
, BadArguments
, BadInstruction
, BadRegister
, BadField
, BadFormat
, BadLabel
You can check the code (or the assembler.html file) for a general description of each. Your assembler should probably raise every one of these in at least one place when it is complete. Not all of these are tested, but they are very useful as you debug. For this lab you will not need the BadLabel
exception.
2 Assemble Individual Types
You will implement the methods for each individual type. Note that the I-types are broken up into separate methods for each slightly different format. Write the code to implement each of these:
Assemble_R_Type
Assemble_I_Type
Assemble_I_Type_shift
Assemble_I_Type_base_offset
Assemble_S_Type
After you finish each of these methods (or pause working) you must commit and push your repo with a meaningful commit message. As you debug and fix errors you should consider doing more commits. You must have at least 5 commits in your repo for this lab (one for each of these methods).
You should edit the Assemble()
function to call the correct one of these for a given instruction.
Your assembler only needs to support decimal immediates, assume all numbers passed as operands to an instruction are in decimal. As you work through the test cases you may want to consider the binary or hex representation of the numbers used in the tests.
The helper get_register_bin()
will be useful for these methods.
I-types are complicated you may want to consider using is_shift_immediate_inst()
, parse_base_offset()
, reverse_string()
, and dec_to_bin()
in these.
End Goal
At the end of this lab, your code should pass all the tests in the base_assembler_test.py
file.
Grading Rubric
All the labs for CSSE232 have these general requirements:
General Requirements for all Labs
- The solution fits the need
- Aspects of performance are discussed
- The solution is tested for correctness
- The submission shows iteration and documentation
Some labs will hit some of these requirements more than others. But you should always be thinking about them.
Fill out the Lab Worksheet
In the worksheet, explain how you satisfy each of these items. Some guidelines:
-
None of these answers should be more than 100 words.
-
You will upload this sheet to gradescope. Make sure you indicate your partner when you upload.
Lab 1 Rubric items Possible Points Lab Worksheet 20 Implements R-Types 20 Implements I-Types 20 Implements S-Types 20 Autograder test cases 20 Total out of 100
- Submit your completed worksheet to gradescope, only 1 per team (make sure all team member's names are included). In gradescope you are able add your team members names to the submission, make sure you do so. You can find the gradescope link on the course moodle page.
TODO: Fix this once we set up group labs
- Lab code will be
submitted to EVERY team member's git repository. You must include your
name in a comment at the top of all files you submit. BOTH PARTNERS MUST SUBMIT CODE FOR ALL THE LABS.
TODO: GET THESE INSTRUCTIONS UPDATED
- Be sure to include your name and your partner's name in all of your files. TODO: FIX THIS
- Open a terminal window and to navigate to your
lab01/p3
folder. - Add the changed file with:
git add p3.asm
- Commit the changes with a commit comment:
git commit -m "YOUR COMMIT MESSAGE"
- Send the changes to the server:
git push
- Verify that your changes were pushed correctly:
- Something about looking on the github website ...
- ...
Fixing GitHub Authentication Issues
Here is a quick and dirty guide to setting up SSH keys if you have been having trouble accessing repos.
- Make an SSH Key
- Open "Git Bash"
- Change to the ~/.ssh directory (
cd ~/.ssh
) - Generate an SSH key by typing
ssh-keygen
- Hit "enter" any time it asks you a question (default RSA type, blank passphrase)
- display the contents of the new public key (
cat id_rsa.pub
) - Copy all the contents it displayed (right click to copy in GitBash, ctrl+c won't work)
- Add the SSH key to your github account
- In your web browser, go to the github website.
- Open your account Settings on the github website (click the user icon in upper-right corner of the web page and choose "settings")
- Select "SSH and GPG Keys" on the left
- Click "New SSH key" green button
- Give it a name like "232 ssh key"
- Paste the key you copied into the large text box
- Click "Add SSH key" button
- Use the key to clone your repo
- In your web browser, open your new repository in the github web page (the URL it gave you when you first created it)
- Click the green "Code" button
- Note that it says "Clone with SSH": copy that URL (looks like git@github.com/....)
- Go back to Git Bash and git clone that copied URL