Practical 1: Assembler I

Objectives

This is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the practical.

Following completion of this practical you should be able to:

Guidelines

Your Tasks

1 Install Python

Install Python 3.x, if you already have Python installed from another class or project you should be good to go. Otherwise, you can download an installer here.

2 Install VSCode

  1. We'll be using VSCode as the IDE for this class, you can download and install it from here. (Read the next step before you click that link.)

  2. During install you may be prompted for what languages you want the IDE to support. If so, select Python and it should install the basic Python extensions.

    If that does not happen you will need to manually install extensions. Once VSCode opens select the Extensions button on left (it looks like 4 squares with one of them detached). In the search bar type "python", select and install the Python extension from Microsoft and the Python Debugger extension from Microsoft

  3. Next, in the search bar at the top of the VSCode window type >terminal and select "Python: Create Terminal" It will open a terminal in the lower part of the screen.

  4. In that terminal type python to launch python and make sure it works. Check the version number to ensure you're running python 3 or newer, you may need to specify a version of python (e.g. type python3.12) to get the right version running in the terminal. Type exit() to close the python interpreter.

  5. In the terminal (outside of python) run the following command:

    python -m pip install gradescope-utils

    Replace the python with the appropriate version/path for the python interpreter you want to use. This installs a testing library that we'll use for the test cases.

3 Get your Assembler repository

To get your git repo, you'll need the git tools installed. If you are using Windows and need to install git, see the git scm website. If you use a different OS, you should use the install method best for your operating system.

  1. We'll be using Github For Education to manage repositories for this course. Read all of these instructions before you begin.

    1. You'll need to create a github account, your account name should be recognizable as your RHIT username.
      1. Use the pattern: "rhit-username", for example Robert's account name would be: "rhit-williarj".
      2. Even if you already have your own personal Github account please make a course-work specific account. We don't want to have to figure out who XxXSuperLuigi2022XxX is when we're grading. If we can't figure out who you are from the username we'll assume you didn't submit the assignment.
      3. You should set up SSH keys for github access. Scroll to the bottom of this page and follow the directions in "Fixing GitHub Authentication Issues". You can read details about that here.
    2. Go to this url to create your repository.
    3. You may be prompted to grant Github Education access to your account, grant access if asked.
    4. Select your Rose username from the list of students when you create your account.
      1. If your username is not listed select the "Skip to next step" link above the list of names. Tell the instructor you did this!
    5. Select "Accept this assignment" on the following page. Create a team name for you and your parter, follow this pattern "csse232-username1-username2" where you replace "username1" and "username2" with your Rose usernames. This will create your repository copy the link on the nextpage it will look something like this: https://github.com/rhit-csse232/practical-assembler-csse232-username1-username2
      • you can click your custom link to view your repository on github.com. The github.com web page will display a green "Code" button with instructions on how to clone your repo.
  2. Check out a copy of your repository.

    1. Open a terminal window. For Windows, the program Git Bash will serve as your terminal. Right click in any folder and select Git Bash Here to start up the terminal.
    2. Navigate to where you want your repo. For example:
      cd Desktop
    3. Use this command to get your repo, adjusting for your url from above.
      git clone git@github.com:rhit-csse232/rhit-csse232-2425a-assembler-joestudent.git

4 Running code and tests in VSCode

Running code and editing arguments

  1. To run your assembler you need to set up the debugger. Your repo should contain a .vscode folder that has a launch.json file inside it. This file specifies the arguments to give the program when it runs.

  2. Select the Run and Debug button on the left side of the screen (it looks like a play button with a bug on top of it). Then near the top you should see a green play button with "Python Debugger: Current File with Arguments". If you have your chosen python file open when you press this button it will start running.

  3. You can change the arguments given to your assembler by opening launch.json and editing the "args" list.

    • Try adding "--help" as an item to this list when you run the assembler.py file below to see its effect.

    You can add and edit this list to change the behavior of the program. This is really just for debugging though, if you leave it with the options provided you can simply edit the contents of test.asm to test different instructions for this assembler. By default it will output the assembled machine code into the file out.txt which will appear in the folder.

Debugging and Testing in VSCode

Expand one of the topics below to learn more about it.

Debugging in VSCode
  1. You should get familiar with and use the VSCode debugger. We will show you some tips in the `R-type` implementation part of this practical, but generally:
    • You can set breakpoints by clicking to the left of the line number in a file. (1)
    • Use the buttons that appear at the top of the window to navigate through the code as you're debugging. (2)
    • The pane on the left will show you variables in the code as you debug. You can also hover over variables in your code to quickly peek at their values. (3)
Running Unit Tests
  1. We're using the Python `unittest` library for our test cases. Each test is preceded by a decorator that looks like: `@weight(N)` where N is the number of points the test is worth during grading.
  2. The provided `settings.json` file should have the testing framework ready for you. Click the flask icon on the left to open the Testing window. (Note, sometimes I have to click or open a file before the button will appear.)
  3. The window will list all the testing files and individual tests. It will look something like this: When you hover over an individual test, or header for a group of tests, three buttons appear to the right of the name. The first one lets you run the test(s).
  4. The second button is the debugger, you can put breakpoints in the tests or your code to use the debugger while the test cases are running to help you find what causes tests to fail.
  5. You should run the tests as soon as you get your repo to make sure that everything is hooked up correctly. You should see output that looks something like this: Each test is listed and then the status is printed. Yours will say "ERROR" instead of "ok" to start with. However, VSCode sometimes defaults to a different python testing framework. If the output mentions `pytest` you should close VSCode and reopen it and it will likely use the correct tests. The tests will still give output, the incorrect output will look something like this: Notice the `pytest-8.2.2` under the first header, and the fact that the individual tests are not mentioned, just the test file name.
Reading test errors

When a test fails it can be very intimidating, here are the two main types of failures you'll see. First, when your code translates an instruction incorrectly you'll see something like this:

This error box is pointing to the test that failed (`test_R_types_add`). In the box it shows the expected and actual result, just left of the orange circle number 1. The top binary string is always the expected (e.g. "right") answer, the bottom is the result from calling your assembler.

Each test group runs several inputs through your assembler, to figure out which exact input caused the failure you need to look at the traceback. Look for the item that refers to the test name that failed. Here it is just left of orange circle number 2. We can see that the input at line 34 caused the failure, so I can go look at it.

Next I would set a breakpoint at that line and press the test debugger button to trace the code.

The second kind of error you'll see will occur when your code does not raise an exception when it should. Here is an example:

This box similarly points to the test that failed (`test_R_types_arguments`). At the top it tells us an exception was not raised when the test expected one, here at orange circle 1 we can see that the test was expecting a `BadArguments` exception but did not get one. At orange circle 2 we can see the line numer of the offending input so we can debug the problem.

This practical provides several different kinds of exceptions for different cases. Some of them may be ambiguous, e.g. when is it a BadArgument vs a BadImmediate. Go with your gut and match the test cases later. The test cases are set up to minimize the number of places in the code you need to check for errors and raise exceptions, rather than detecting errors as soon as possible.

Build an Assembler

You've been given a partial implementation of a 32-bit RISC-V assembler. Your job for this practical is to implement the parts of the assembler needed to make R-, I-, and S- types work. You only need to edit methods in the practical that are listed below and that have a line like: TODO: Practical 1 in their body. You can ignore the Practical 2 TODOs for now, and do not make changes to any other methods or classes. You are free to add your own new helper methods and classes as you see fit.

Before you start writing code you should look at the practical worksheet and read the Grading Rubric section below.

0 Tips and Hints

You should open up assembler.py and review the general code structure (we'll walk you through it in a bit more detail in the next section). You can look through the docs/assembler.html file in your repo for an easy to read list of functions and classes. I recommend you keep this open so you can look at any helper methods at a glance.

Some helper methods are implemented for you which you are free to use, these are not tested so you are free to change them as you see fit.

Python implements a few ways to convert integers between bases. First, the int() method takes a second argument that defines the base:

int("101", 2) -> 5

int("11", 2) -> 3

int("101", 16) -> 257

int("11", 16) -> 17

Additionally the bin() method converts a decimal integer into a binary string:

bin(5) -> '0b101'

bin(3) -> '0b11'

bin(-3) -> '-0b11'

Note that it does not use twos-compliment for negative numbers and the strings always start with a '0b'.

1 A Tour of the Assembler

This file is BIG and has a bunch of code. This section will take you on a little tour of the code, explaining the overall structure of the assembler. Don't forget that you can open up an html version of the documentation for this file by opening docs/assembler.html in your practical repo.

More detailed tour

Near the top of the file is the assembler_asm() method, this method does all the heavy lifting to take in a text file of assembly instructions and break it down into binary one instruction at a time. It does this in 4 big passes, 1) remove comments, 2) process pseudoinstructions, 3) process labels, and 4) translate individualte lines into machine code. For this practical you only need to worry about step 4.

Below this method is a section where each "pass" of the assembler is broken up into helper methods, you do not need to worry about these for now.

The next section (starting at around line 100) is where the core assembler methods are defined. The heart of the assembler is the `Assemble()` method. This takes a single instruction and returns its binary representation. You'll be writing this method later, it will do most of its work by calling the `Assemble_*` helper methods below.

Each type of instruction in RISC-V has at least one helper (e.g. `Assembler_I_Type`). Each of these methods will process one instruction type and return the binary representation of a given instruction. These methods are where you will write most of the code for this practical.

The next section (starting at around like 355) has some helper methods. Some of these are implemented for you and some you will need to implement in practical 2. You should look at these methods and consider when you can use them. Some of these helpers also show you how to manipulate strings in python in helpful ways.

Below this is the `output()` method, which is used to output the final result to a file. You do not need to edit this code.

Next is the "Utilities" section, where the different types of instructions and fields are defined. Here you'll find a `FieldData` class, which is a simple struct to hold the data for different instructions for our use later. There are several other helpers defined in this section when you should look over and make use of.

Finally below this are the definitions for the different custom exception types that are set up for the assembler. Not all of these are tested, you should use them as you see fit to make debugging easier for you.

2 Assembling R-type instructions

First we'll implement assembling R-type instructions and see how to use the test bench to find errors in our code.

  1. Open up the Testing window (the flask icon), expand the Base_assembler_test.py and then the TestRType groups. Hover over the test_R_types_add test until you see the run button on the right, and click it. You just ran a single set of tests for the add instruction, the test should fail the traceback will indicate a NotImplementedError was raised.

  2. Go to the Assemble_R_Type method in the assembler file, replace the raise NotImplementedError with this:

    return "0000 0000 0000 0000 0000 0000 0000 0000"
  3. We made this method just will turn all R-types into all 0s in binary, this isn't correct, but lets re-run the tests and see how it looks now. Re run the test_R_types_add tests and look at the output. You will see an AssertionError showing the expected output and the recieved output of all 0s just below it.

  4. Okay, now we can get down to business, lets start by figuring out which R-type we are translating. There is a helper function for just that, add this code to the method:

    field_data = inst_to_fields[cmd]

    You should look at the docs for details on this dictionary and the objects in it. But it returns a small object that gives us all the binary for the opcode, funct3, and funct7 fields.

  5. Next lets get the register info out of the instructions, we can do that using another helper and the operands list:

    rd  = get_register_bin(operands[1])
    rs1 = get_register_bin(operands[1])
    rs2 = get_register_bin(operands[2])
  6. Finally we need to put all these pieces together, you can do this however you want, but there is a helper built in to make this a bit easier. I start by combining all the parts of the isntruction into a list and then calling the helper:

inst_field_list = [field_data.func7,
                       rs2,
                       rs1,
                       field_data.func3,
                       rd, 
                       field_data.opcode]

return join_inst_fields_bin(inst_field_list)
  1. The code above has an error, we're going to run the tests and see how to track down the problems. Re-run the R-type tests and you should see something like this:

Note that the part of the expected and recived result is "underlined" with - and + symbols, if we look at that binary it seems like there is a problem with rd we can go fix it now, but instead we'll use the debugger to help us find the exact location of the error.

Put a breakpoint on line 34 of the base_assembler_test.py file, where the first test is called. Then hover over the test_R_types_add entry in the test navigator on the left, and press the Debug test button, (it looks like a play button with a bug on it). That will open up the debugger and pause it on the line where we put the breakpoint. Press the step into button in the little control panel at the top of your screen until the code reaches the assembler.py file (it should take 7 step into clicks). At this point you should see the variable explorer on the left, something like this:

Here we see the instruction add is passed in as the cmd argument, and the operands are all x1. Step through the code until field_data, rd, rs1, and rs2, have all been assigned. You may want to use the step over button to not go into the helper method code.

Now the variable explorer will show the values of each register ID field, they are all set to '00001' which seems right to me! Continue to step through this code until you are taken back to the test file. This test passes, so the error wasn't here (which we should have notices if we read the error message earlier closer, notice it says it failed on like 38), lets continue stepping through the next test for "add t0, s0, sp". Step through to the point all the register fields are assigned in the method, and look at the value of rd is it right? You should see this in the variables pane:

Both rd and rs1 have the same value, but they are different registers in the operands list! The correct value for t0 is "00101" so there must be something wrong with rd. Fix the call to get_register_bin for rd so that it uses the correct operand.

  1. Re-run the add tests now and you should see they work. And because of the magic of dictionaries if you run the other R-types they should nearly all pass! You can run the full group by pressing the run button while hovering over the TestRType group. Because the only difference between R-types is the opcode, funct3, and funct7 fields the inst_to_fields dictionary already pulls the right data for each type for us.

  2. One test should have failed, the test_R_types_arguments test. It checks that errors happen when the instruction is misformatted, e.g. with too many operands or the wrong type of operands. Look at the test and you'll see that the first test is checking what happens when 4 operands are passed in, the tests are expecting a BadOperands exception to be raised. Back in the assembler, lets check for this error right at the start of the R-type method, add this code:

    if(len(operands) != 3):
        raise BadOperands("Incorrect number of operands found in R Type on line %s with args:\n\t%s %s\n" % (line_num, cmd, operands))

You can edit the message in the BadOperands exception above to have more or less info, whatever will be helpful for you during debugging. Re-run the tests and all the R-type tests should pass now!

  1. There are a few other edge cases you should consider even though the tests here dont check for them. What happens if this method is called with a bad instruction name? (e.g. Assemble_R_Type('rob', 't0, t1, t2', 0)), you should probably raise a BadInstruction exception if this happens. What if the translated number of bits is wrong? Or the number of fields is wrong after translation? You may want to check for these errors and raise BadFormat or BadField errors. The tests wont check for this, but they will help you while you are debugging, so you may want to add logic to check for these now.

  2. You should now follow a similar procedure to implement the other instruction types for this practical.

1 Assemble()

The heart of the assembler is the Assemble() method. This takes a single instruction and returns its binary representation. At its core though, this method simply figures out the type of the instruction and then calls the appropriate helper function (which you will will write below). Try and make this function do as little work as possible, it will make your code easier to edit in the future.

You should not try and write this function all in one go. I recommend you add to it as you expand your assembler to support more types, just add support for one instruction type at a time.

To start First in Assemble we need to call the appropriate helper function based on the type, for now we only have R-type instructions, so lets just call that one directly. The helper functions take 3 arguments the instruction name, the list of operands, and the line number. So, we have to do a little bit of work to split the instruction up. Put this code into the body of the Assemble method, replacing the raise NotImplementedError line:

split_inst = inst.strip().replace(",", " ").split()
cmd = split_inst[0]
args = split_inst[1:]
result = Assemble_R_Type(cmd, args, line_num)
return result

This breaks the instruction up and then calls the R-Type helper. If the method is invoked like this: Assemble("add t0, t0, t0") then Assemble would call Assemble_R_Type("add", ["t0", "t0", "t0"]) and return the result.

Open up the file called test.asm in the repo, this file has two test R-types for assembling. If you run the assembler.py file now it should try and translate the code from test.asm code into binary and put it into a file called out.txt. Run the code by going to the debug pannel and pressing the run button (make sure you have selected Python Debugger: current file with arguments) then take a look at the output. Try adding a new R-type instruction to the test.asm file and seeing how the output changes. Make sure you tab back into the assembler.py file before you run, otherwise VSCode will try and run test.asm as a python file!

As you are working you can add instructions and comments into test.asm for your debugging purposes.

You should add to the Assemble() method each time you implement one of the new types (or sub types) of instructions in the next section. Do not try and implement it all at once.

A few tips as you work on these:

There are several example exceptions raised throughout the template, take a look at dec_to_bin for a couple examples (and consider why these are raised here).

This program provides a suite of custom exceptions: BadImmediate, BadArguments, BadInstruction, BadRegister, BadField, BadFormat, BadLabel

You can check the code (or the assembler.html file) for a general description of each. Your assembler should probably raise every one of these in at least one place when it is complete. Not all of these are tested, but they are very useful as you debug. For this practical you will not need the BadLabel exception.

2 Assemble Individual Types

You will implement the methods for each individual type. Note that the I-types are broken up into separate methods for each slightly different format. Write the code to implement each of these:

  1. Assemble_R_Type
  2. Assemble_I_Type
  3. Assemble_I_Type_shift
  4. Assemble_I_Type_base_offset
  5. Assemble_S_Type

After you finish each of these methods (or pause working) you must commit and push your repo with a meaningful commit message. As you debug and fix errors you should consider doing more commits. You must have at least 5 commits in your repo for this practical (one for each of these methods).

You should edit the Assemble() function to call the correct one of these for a given instruction.

Your assembler only needs to support decimal immediates, assume all numbers passed as operands to an instruction are in decimal. As you work through the test cases you may want to consider the binary or hex representation of the numbers used in the tests.

The helper get_register_bin() will be useful for these methods.

I-types are complicated you may want to consider using is_shift_immediate_inst(), parse_base_offset(), reverse_string(), and dec_to_bin() in these.

End Goal

At the end of this practical, your code should pass all the tests in the base_assembler_test.py file.

Grading Rubric

All the practicals for CSSE232 have these general requirements:

General Requirements for all practicals

  1. The solution fits the need
  2. Aspects of performance are discussed
  3. The solution is tested for correctness
  4. The submission shows iteration and documentation

Some practicals will hit some of these requirements more than others. But you should always be thinking about them.

Fill out the Practical Worksheet

In the worksheet, explain how you satisfy each of these items. Some guidelines:

  1. Submit your completed worksheet to gradescope, only 1 per team (make sure all team member's names are included). In gradescope you are able add your team members names to the submission, make sure you do so. You can find the gradescope link on the course moodle page.

  2. You must include your name in a comment at the top of all files you submit.

    1. Be sure to include your name and your partner's name in all of your files.
    2. In VS code, switch to the "Source Control" tab on the left side of the screen ().
    3. Click the "+" next to each file you changed that you'd like to submit. Do not submit any temporary files or click the plus next to the "Changes" header (which adds everything). Be careful to only select the code files that you need to sumbit
    4. Type a message in the "Message" box above the big blue Commit button.
    5. Click the big blue Commit button.
    6. Push your changes by clicking the three dots in the top of the Source control pane, then selecting "Push" from the menu that appears.
  3. Verify that your changes were pushed correctly:

    1. Visit your github classroom repository in your favorite web browser (Edge, Chrome, Firefox, Safari, Etc)
    2. Open one of the files you changed and make sure it looks as you expect.

Fixing GitHub Authentication Issues

Here is a quick and dirty guide to setting up SSH keys if you have been having trouble accessing repos.

  1. Make an SSH Key
    1. Open "Git Bash"
    2. Change to the ~/.ssh directory (cd ~/.ssh)
    3. Generate an SSH key by typing ssh-keygen
      1. Hit "enter" any time it asks you a question (default RSA type, blank passphrase)
    4. display the contents of the new public key (cat id_rsa.pub)
    5. Copy all the contents it displayed (right click to copy in GitBash, ctrl+c won't work)
  2. Add the SSH key to your github account
    1. In your web browser, go to the github website.
    2. Open your account Settings on the github website (click the user icon in upper-right corner of the web page and choose "settings")
    3. Select "SSH and GPG Keys" on the left
    4. Click "New SSH key" green button
    5. Give it a name like "232 ssh key"
    6. Paste the key you copied into the large text box
    7. Click "Add SSH key" button
  3. Use the key to clone your repo
    1. In your web browser, open your new repository in the github web page (the URL it gave you when you first created it)
    2. Click the green "Code" button
    3. Note that it says "Clone with SSH": copy that URL (looks like git@github.com/....)
    4. Go back to Git Bash and git clone that copied URL