Lab 5 Single Cycle Processor

Objectives

This section is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the lab.

Following completion of this lab you should be able to:

Instantiate and combine modules in verilog.
Read and interpret simulated waveforms.
Implement and test a limited instruction set single-cycle processor (R, I, and S types) in ModelSim.
Use waveform diagrams and verilog test benches to debug a processor implementation.
Discuss clocking strategies for a set of sequential logic that must happen in a specific order.

Guidelines

Because you will be iteratively adding functionality to one processor module, we strongly recommend that you periodically add and commit your progress to git as a backup.

Your Tasks

Follow this sequence of instructions to complete the lab.

0 Obtain your `C-group` git repo

TODO: instructions for getting the repo

1 Implement R-type Instructions

You will begin by implementing a processor that only implements R-Type RISC-V instructions.

On the lab worksheet, there's a datapath drawing.
1. Using the RTL from class, trace the wires used for the add instruction.
2. Label each traced wire with a name. For example, consider the wire coming out of the right side of Instruction Memory; you could label this wire inst to be consistent with the RTL.
3. As you trace through blocks of logic (register file, muxes, etc) circle the names of any control bits necessary for the add instruction.

We've provided you with the start of a verilog control unit, Control.v and a test bench tb_Control.v. It doesn't currently do much, but has all the inputs and outputs you'll need.

Implement Control
1. Open up your C-group repo in VS Code (or your favorite text editor)
2. Edit Control.v
3. In the always @(opcode) block of the control unit, look for the case block for R_OPCODE. Between the begin and end, add value assignments for all the control signals you will need to set to make add happen.
  - You can see Figure 4.26 in your textbook (should be page 281) to verify which signals and which values matter for "R-format" instructions.
  - HINT: ALUOp is a two-bit bus, unlike in the texbook table that treats each bit separately; do not set each bit separately, instead assign a two-bit binary value.
  - Note: you should also include every "write" control signal and explicitly disable them (set to zero) for things you don't want to be written (for example, memWrite).
Test Control

Once you've coded control for R-types, you need to make sure it emits the right signals!
1. In VS Code, open the tb_Control.v file and review the test we provide for R-type instructions.
2. Open ModelSim, and create a new Lab5 project file in your C-group git repo.
3. Add all the verilog files in the repo to your ModelSim project.
4. Compile all the verilog files, fixing any errors in Control.v and tb_Control.v. Don't worry about errors in the other modules for now.
5. Start simulation for the tb_Control module and run -all. The tests should pass; if they don't, check your work.
6. You could add another test, but since all R-format instructions have the same opcode, there's no need.

When you get control working, consider doing a quick git pull, add, commit, and push to save your work.

Implement the Datapath for R-types

In this part you will use your control unit and design a datapath around it to execute R-type instructions from memory.
1. Examine the Processor.v file. We're providing you with a processor module that has a few component instances in it, but they're not hooked up. Compare this set of instances with the components you traced on the worksheet.
2. Add new instances of components you will need to make your processor execute only R-types.
  There's no need to connect them yet, just make instances. Here are some hints and suggestions:
  - We are providing you with working implementations of ALU.v and ALUCtl.v. They should already be in your C-group repository.
  - We've also included copies of the DP_Memory.v, and Register.v from Lab 4.
  - There are lots of suggestions and tips in the comments in the files provided, make use of these.
  - Instruction and data memory will be the same component. We're using both halves of the DP_Memory module. See the comments in Processor.v for details.
  - Don't make multiplexers. You can make them with raw verilog later.
  - There's no register file in your git repo so you will have to make your own!
    - SUGGESTION: Since you created and tested this in the previous lab, one member of your team should copy their implementation into your C-group repo and use them for this lab.
    - Don't forget to add this file to your ModelSim project after you've made it.
3. Create wires to connect the components. Create an instance of each wire you labeled on the lab worksheet's datapath; use the same name you wrote on the worksheet.
  - Some of your wires will need to be 32 bits: wire [31:0] myWideWire;
  - Put the wire declarations inside the Processor module before you declare the other major components. This will ensure they're ready to use by any component that needs it.
4. Attach the wires to your component's input or output pins
  - This is as simple as writing the wire name between the parentheses next to the input or output where you want to attach it. In this example, the output of instance A is connected to the input of instance B:
```
  wire [4:0] myWire;
  Thingy A(
      .InputPin1(),
      .OutputPin(myWire)  // <-- attach one end here
  );
  Thingy B(
      .InputPin1(myWire),  // <-- attach the other end here
      .OutputPin()
  );
```
  - STRONG SUGGESTION: Because you are not implementing immediates, memory, or branch instructions yet, ignore the three muxes used by the ALU (from ImmGen) branches (From the adder) and data memory (output of memory) for now. Assume the wire goes straight through the mux. For example, you might have a wire B that goes directly from the register file into the second input on the ALU. You can add these muxes later.
  - A note about Memory: When you hook up memory you will need to adjust the address. RISC-V uses byte addressing, but the provided memory modules use word addressing. Because our words are 4 bytes long, every byte address is 4 times too big. For example, the second instruction in memory will be at byte address 0x0004 but word address 0x0001. We can simply shift the byte address right by 2 to divide by 4 and convert it to a word address. Additionally, the provided memory only has 10 bit (word) addresses (meaning 2^12 byte addresses), this is because of the limitations on the amount of memory on the FPGA we are simulating. Feed in the least significant bits of your address (after left shifting) to account for this (we're essentially cutting off the top of memory when we do this). As you debug keep in mind that if you look at the values in memory the addresses will be 4 times smaller than the address you would "expect".
5. Connect the reset input on Processor to all components that have a reset (most likely the PC, and RegFile). When you "reset" the processor, you want to clear out all the registers and make sure the PC goes back to the beginning of your code.
6. Be sure your Processor compiles in ModelSim before moving on. Make sure you've added all the files used in your Processor or the tests to the ModelSim project.
Test with a few R-types
1. Examine memory-R.txt. This is the first set of instructions you will test on your processor. It has an add and a sub that have been assembled and put into memory in this order.
2. In the tb_Processor_R.v file, you can see a start at a test bench for your processor. It loads memory-R.txt into memory, then resets your processor, then allows the clock to cycle and checks the changes to the register file after each cycle.
  - Modify the test to properly inspect the contents of your register file. Follow the instructions in the code to tell the test bench how to look at your regsters' values.
3. In ModelSim, run the tb_Processor_R tests!
  - If they don't pass, consider building a waveform with control and all your labeled wires to inspect what it's doing.
  - If you build a waveform, be sure to save it! We recommend calling it something similar to the test bench where it is useful (tb_Processor_R_waves.do for example).
4. Once the two instructions are succeeding, assemble a few more R-type instructions and add them to the memory file, then add tests to the verilog test bench.
  - HINT: you can use your assembler from labs 1/2, or RARS to quickly assemble instructions.
Once you've got R-types working, save your progress in git!
- Be sure to add, commit, and push only the verilog files, changed memory file, and any waveform .do file you edited or created.
- SUGGESTION: Have each member of your group create their own Lab5.mpf project file in Model Sim. Don't commit these to git. This will save you some time.

2 Implement Basic I-type Instructions (skip `lw` and `jalr` for now)

On the lab worksheet, and using the RTL from class, trace the wires used for the addi instruction. 2. Label each traced wire with a name. 3. As you trace through blocks of logic (register file, muxes, etc) circle the names of any control bits necessary for the instruction.
Update Control.v and tb_Control.v for I-type instructions (except not lw or jalr, don't do those yet).
- The table in your textbook doesn't have the set of control values for these, but they are very similar to R-format.
Test Control
1. In the control test bench add a test or two for I-format instructions.
2. Run your tests and verify they pass.
Update the Datapath for I-types
1. Add to your Processor verilog module any components you traced for I-types on the lab worksheet datapath.
2. Create instances for any wires you traced.
3. Attach the wires to input/output ports of your componets.
  - HINT: there's a mux controlling the second input of the ALU. Now you need to implement that mux! There are many ways to do this and we'll show you three of them.
    
    First, you can "conditionally connect" a wire to the input. For example, this code connects wire A to the input when "ACONTROL" is 1, and otherwise it connects B:
```
  wire [4:0] A;
  wire [4:0] B;
  Thingy UNIT(
      .InputPin1( ACONTROL ? A : B ),
      .OutputPin()
  );
```
    Another way is to create a third wire from the output of the mux, lets call it "choice", then conditionally assign that wire:
```
  wire [4:0] A;
  wire [4:0] B;
  wire [4:0] choice;
  assign choice = ACONTROL ? A : B;
  Thingy UNIT(
      .InputPin1( choice ),
      .OutputPin()
  );
```
    A final way is to use an always block to recompute choice when the inputs change. Note that choice is a reg type here:
```
  wire [4:0] A;
  wire [4:0] B;
  reg [4:0] choice;

  always @(A,B,ACONTROL) begin
      if (ACONTROL === 1) choice <= A;
      else                choice <= B;
  end

  Thingy UNIT(
      .InputPin1( choice ),
      .OutputPin()
  );
```
    These all effectively do the same thing.
  - Dont forget to update any components you hooked up for R-types that may need new connections (for example, the Read 2 port on the regitster file).
Test your I-type processor
1. Make a copy of the tb_Processor_R.v file called tb_Processor_I.v.
2. Create a memory_I.txt file with some assembled I-type instructions (much like the memory_R.txt file).
  - Add at least five tests for I-types.
3. Inside the new tb_Processor_I.v file, replace the R-type tests with ones for the instructions you've put in the new memory file. You may need to add more tests.
  - IMPORTANT: be sure your new tb_Processor_I test bench loads the new memory file!
  - These will be very similar to the R-type tests!
4. Run your tests!
  - SUGGESTION: make another waveform for this new test bench and save the config to tb_Processor_I_waves.do or similar.
Be sure the R-type tests still work with your updated Processor.v.
- Re-run your R-type test bench (tb_Processor_R) after you get the I-type tests working.
Once you've got I-types working, save your progress in git!
- Be sure to add, commit, and push only the verilog files, changed memory file, and any waveform .do file you edited or created.

3 Implement Memory Instructions

Now you will repeat the same steps for lw and sw. This should be faster than the first two parts.

Trace lw and sw on the worksheet datapath.
There is now a critical series of more than two clocked things that must happen in sequence during one clock cycle:
- (a) Read instruction from memory
- (b) Load data from memory
- (c) Put in register file
The processor cannot put the value into the register file until it has been loaded from memory.
And it cannot do any of this until it has read the instruction (a). The clock only has two edges, rising and falling edge, so we need a strategy to handle this!

If your register file has async reads, this makes it much easier; we can write the data into the register file while the next instruction is getting read from memory. Both (a) and (c) can happen on the rising edge, and the memory read (b) can happen between the other two.
1. On the lab worksheet, complete the timing diagram. Assume two load word instructions execute one after the other, and each must do all three steps above, requiring execution of these steps:
  - 1a: first load instruction fetched from memory
  - 1b: first load instruction gets data from memory
  - 1c: first load instruction writes data to reg file
  - 1d: first load instruction updates PC (to be PC+4)
  - 2a: second load instruction fetched from memory
  - 2b: second load instruction gets data from memory
  - 2c: second load instruction writes data to reg file
  - 2d: second load instruction updates PC (to be PC+4)
  Label the clock signal edges in the worksheet with the instruction steps above each clock edge where they should happen. Multiple steps from one instruction or steps from both instructions may need to happen simultaneously. 1a is given for you.
Update Control for the new instructions. Note that lw and sw each have a unique opcode, so you'll want to create a control case for each.
Update the control tests to also test lw and sw.
Update your datapath for lw and sw.
- IMPORTANT: Instead of adding a new memory component for data memory, use the "B" half of the DP_Memory block that is already in your code.
  - The "B" half should operate on the falling edge of the clock so it happens after the instruction is loaded (at the rising edge). But you don't want to change the memory module, so instead just invert the clock ( use ~CLK) when you connect it to the B-half port clk_b input. This will make the negative edge "look like" a positive edge.
- SUGGESTION: You'll need to put a mux between the output of the ALU and the input of the register file. For this mux, DO NOT use the ternary operator (q ? a : b). You will want to grow the mux later, so it's best to do it with an always block:
```
  // this wire connects to the output of the mux and the write data port on the reg file.
  reg [31:0] aluOutputOrMemData;

  always @(A,B,MemtoReg) begin
      if (MemtoReg === 1) aluOutputOrMemData <= A;
      else                aluOutputOrMemData <= B;
  end
```
- It is OK to completely ignore the MemRead control signal. Our memory is always reading.
Test your updated processor
1. Make a copy of the tb_Processor_I.v file called tb_Processor_S.v.
2. Create a memory_S.txt file with some assembled load and store instructions (much like the memory_I.txt file, but with only loads and stores in it).
3. Inside the new test bench file, replace the I-type tests to test your loads and stores from the new memory file.
  - IMPORTANT: be sure your new tb_Processor_S test bench loads the new memory file!
4. Run your tests!
  - SUGGESTION: make another waveform for this new test bench and save the config to tb_Processor_S_waves.do or similar.
Be sure to run old tests and new tests!
Add, Commit, Push

4 Implement Branches

Now you will repeat the same steps for beq and then add the three other branch flavors.

Trace beq on the worksheet datapath.
1. Update Control and Tests for beq
2. Update Processor (datapath + control)
  - HINT: for the mux, implement it like the one for memory. Use an always block to sometimes assign the input to the PC to be PC+4 and sometimes to be the computed branch target.
3. Create a memory_B.txt file with some assembled beq instructions (you can also use other instructions that you've tested).
  - You want to test both a beq that doesn't get taken and one that does, so you may need to skip over some instructions.
  - HINT: you can ensure beq x0, x0, LABEL should always be taken!
  - If you put a nonzero value into a register and compare it to zero, the branch won't be taken.
4. Make a test bench (copy one of your others) called tb_Processor_B.v to test branches.
  - Update this to use your memory_B.txt file and write tests to make sure the PC goes to the right place.
  - HINT: your test bench can inspect the value of the PC by "digging into" the unit you are testing. Assumming your test bench has an instance of Processor called UUT and the processor has an instance of Register called PC, you can do this:
```
Processor UUT(.CLK(CLK), .reset(reset));
```
  // ... your tasks go here
  
  initial begin //... setup code here... // Look at the PC's output to see if it is correct VU.ASSERT_INT_EQUAL(UUT.PC.q, 32'h00004444);
```
 You can inspect all your `wire` instances inside the `Processor` module in a similar fashion.
```
5. Run your tests!
  - You are likely to find that your tests don't pass initially. Take a careful look at when and how PC changes. Investigate your waveform and look at the timing diagram you made for lw/sw on the worksheet. You will likely need to change the timing for branches to work. Make sure after you make any changes that the old tests you wrote still work (don't assume they do, actually run them). If you want another hint, expand this after trying to work through it yourself for a bit:
    Hint
    You need to move only one component to work on the negative clock edge.
6. Be sure to run old tests to make sure they still work
7. Add, Commit, Push your new files and updates. Now is a great time to save your changes.
Next, add bne.

BNE is nearly identical to BEQ, but you want the opposite of the zero detector. There's no change to control for bne because it has the same opcode as beq.
1. Update the processor to support BNE.
  - The only modifications you should make will be to your "branch mux". Change the always block to also sense changes in the funct3 field of the instruction. Now you can differentiate between bne and beq.
  - Change the conditional logic in your branch mux to first look at the funct3, then look at the zero detector from the ALU when deciding whether or not to take the branch.
2. Add some tests for bne to your memory_B.txt code and to tb_Processor_B.v
Add bge and blt
1. bge and blt are a little harder because the zero detector is not very useful for these instructions. Instead we care if A - B is positive or not. When A - B < 0 then A must be less than B. The most significant bit of A - B will tell us if A < B or not.
2. In the lab worksheet, complete the truth table to help guide you in constructing some verilog that will correctly choose branch target when a branch of various types should be taken.
3. Update Processor.v to support the last two types of branches.
  - Update the "branch mux" logic to support the two new funct3 values (bge and blt).
  - HINT: You need to add something new to the always block's sensitivity list.
  - HINT: use a case statement to make a decision differently based on the type of comparison you want to do.
  - IMPORTANT: Be sure to have a default case in the case statement in case an unexpected funct3 value shows up.
4. Update your tb_Processor_B.v test bench and the memory text file to have a bge and blt test.
Add, Commit, Push any changes. DO NOT commit everything, only the important files (do not commit the 'work' directory.)

Turn It In

Grading Rubric

General Requirements for all Labs:

fits the need
discuss performance
tests for correctness
iteration and documentation

Fill out the Lab Worksheet

In the worksheet, explain how you satisfy each of these items. Some guidelines:

None of these answers should be more than 100 words.
For item 1, ??
For item 2, explain how this implementation might have a bunch of "unused time" during a clock cycle.
For item 3, describe the most important thing to test when implementing branches.
For item 4, describe whether implementing the branches gradually in three steps was a good idea and why or why not it was a good idea.

Lab 5 Rubric items Possible Points

Lab Worksheet 20

R-Type and tests 15

I-Type and tests 20

Memory insts and tests 20

Branch insts and tests 20

Extra points 5

Total out of 100

For extra points, you could:
- Optimize the branch logic from part 4 to be simpler than a case statement. Document with comments in your code how you did this.
- ... etc

Lab 5 Rubric items	Possible Points
Lab Worksheet	20
R-Type and tests	15
I-Type and tests	20
Memory insts and tests	20
Branch insts and tests	20
Extra points	5
Total out of	100

Submit your completed Lab Worksheet to gradescope.
Lab code will be submitted to your C git repository as new files and committed modifications to the repo we provided you. You must include your name and your teammates' names in a comment at the top of all files you submit.