Practical 9 Pipelined hazard resolution

Objectives

This section is not a list of tasks for you to do. It is a list of skills you will have or things you will know after you complete the practical.

Following completion of this practical you should be able to:

Implement data hazard resolution in a pipelined processor by employing write-before-read, data forwarding, and stalls.
Implement control hazard resolution by flushing the pipeline
Use waveform diagrams to debug a pipelined processor implementation with instructions in all stages of the pipe
Use verilog test benches and a testing framework to test a processor implementation

Guidelines

Because you will be iteratively adding functionality to one processor module, we strongly recommend that you periodically add and commit your progress to git as a backup.

Your Tasks

Follow this sequence of instructions to complete the practical. This practical will all be done in your practical-pipe repository

1 Run the hazards tests

During this practical, you will gradually be fixing data hazards for R-types until you've fixed them all; then you will look at other types of hazards to fix.

To begin, open up the file in test_asm/datahaz/test_datahaz_x2.asm and read the test code (and comments) provided.
On the worksheet, identify potential dependencies in the code, create a pipeline diagram for the code, and mark how you intend to solve the data hazards (forwarding and write-before-read).
Open up the tb_Pipe_hazards.v test bench and scroll to the bottom. Notice there are a sequence of test tasks commented out, much like in the last practical and the first one (test_no_hazard_detection()) is the only one uncommented.
Scroll up to the implementation of test_no_hazard_detection() and observe that it (and many other tasks) simply check that the final states of the registers are correct. Answer the question in the worksheet about these tests.
Review check_data_hazard_general() to ensure it will work (work does not mean pass it just means that you understand the code and see where it will fail in your current implementation) with your pipelined processor implementation. It uses the same shortcuts in pipeline_test_tools.vh that you may have edited for Practical 8, so hopefully there won't be much to change.
Open the ModelSim project you created for Practical8 and add tb_Pipe_hazards.v to the project. Compile it and simulate this test bench. Fix any bugs or errors until you can get test_no_hazard_detection() to pass its test. (Note: that you may not pass this test if you already have implemented the write-then-read behavior. Consider your answer to the 1.4 question on the worksheet.)

2 Write then read

Once you've passed the check_data_hazard_general() tests, comment it out in the test bench's main initial block.
In that same initial block, uncomment the test_write_then_read_hazard_detection() task and the call to CLEAR_PIPE() that follows it. (See comments in that block)
Compile and run the test bench in ModelSim. It might fail if you've not implmented write-before-read in your datapath. That's ok!
Figure out how to make your reg file write before it reads
- hint: consider when you should write to the register file so it can be read at the right time (but before the pipeline stage registers get written).
Once you get this test to pass, answer the next question on the worksheet: Describe the process you plan to follow to incrementally address data hazards in your pipeline for R-type instructions. If you’re not sure what process to follow, review the comments in the ASM file (test_datahaz_x2.asm) and the Test Bench (tb_Pipe_hazards.v).

3 Data forwarding

Uncomment the next test in the test bench (test_WB_to_EX_fwd()) and compile then run the test bench again.
On the worksheet, write some pseudocode that describes how you will detect the need to forward data to one of the two register operands (A or B) when an instruction in EX needs data from WB.
Create forwarding unit module and add it to your Processor.
- SUGGESTION: connect some outputs from the MEM_WB pipeline stage register and from the ID_EX pipeline stage register to determine whether the hazard exists, then create an output that will control a mux to use forwarded data (from MEM_WB) or the standard data from the EX cycle.
- Get the first forwarding (WB -> EX) working before you try to address the other conditions.
Handle WB -> EX forwarding
- uncomment the test for this in the test bench, and update your forwarding unit and Processor accordingly.
Handle MEM -> EX forwarding
- uncomment the test for this in the test bench, and update your forwarding unit and Processor accordingly.
At the end of this step, your test bench should run and pass the following tests, in sequence:
- test_write_then_read_hazard_detection()
- test_WB_to_EX_fwd()
- test_MEM_to_EX_fwd()
Add, commit, and push your code changes to git. Be sure to add your assembled versions of the asm files.

4 `lw` stall

Examine test asm file test_datahaz_lw.asm, then assemble it.
Construct a hazard unit module
Handle stall when lw is in EX and the next instruction will use its rd value (see page 322 in the textbook)
- hint: disable writing to the IF_ID register and PC when stalling.
- hint: need to insert bubble in ID_EX (hint: add a flush capability to the stage register that puts all zeroes into its control bits and any instruction data you're carrying)
- hint: special case for UJ and U types that follow a lw: they don't use register sources!
Uncomment and run our tests (test_lw_stall())
Now test sw: read and assemble test_datahaz_sw.asm.
Uncomment and run our tests (test_sw_forwarding())
Fix any errors that you need to make those tests pass. (You may have to update your forwarding unit.)
On the worksheet, answer the question about forwarding from lw to sw.
Now is a great time to commit your changes to git. Include any assembled versions of the asm files.

5 Flushing the pipe

On the worksheet, draw a pipeline diagram for the instructions given and indicate any data forwarding or stalls or flushes.
Examine test asm file test_ctlhaz_beq.asm
- NOTE: this test assumes branch writes the PC in the memory cycle (see textbook). This means everything following it must be flushed.
- If your processor takes the branch earlier (in ID), you will need to edit the test asm file and test bench.
Figure out how to insert bubbles as instructions leave the EX, ID, and IF cycles
- hint: use the same trick used to insert a stall to flush EX_MEM, ID_EX, and IF_ID stage registers while the PC is being written to the branch target by the instruction in MEM.
Run our tests (test_beq_flush()) and fix any bugs in your processor.
We do not provide you with tests for jal/jalr, which also need to flush the pipeline when they jump.
- Write new tests for jal and jalr and run them.
- Add the asm and assembled code for these tests to git and push your commit!
- Be sure to add tasks to the tb_Pipe_hazards.v test bench to run your tests.

6 Write and run a bigger test

Examine the following code:

// Array A's memory location is in x5
int[] A = {1, 2, 3, 4, 5};
int idx = 0;
while(idx < 5) {
    A[idx] = A[idx] + 1;
    idx = idx + 1;
}

Write the code for this on the worksheet (and put it in an .asm file in the test_asm folder that you add, commit, and push to git)
- To initialize the array, it is ok to pick an address in memory and put the integers in your assembled .txt file there. (You don't need to write RISC-V instructions to do that).
- To initialize x5 to have the address of A, load the address as an immediate (remember lui and addi? Or maybe you have an assembler that supports pseudoinstructions like li?) in your code.
- idx can be any register of your choice and does not need to be stored in memory.

Open tb_Processor_Program.v in VS Code and observe how it loads a .txt file and runs the program in that file.
Make a copy of the testProgramA() task in the test bench and modify the copy to run the code you wrote above.
- HINT: you can use CHECK_MEM() to check contents of memory in your test bench. Do this to see what the array values are after the program runs.
- HINT: testProgramA takes an argument and an expected result; you can remove those from your copy for this test.
For HW 10, you wrote a program that includes relPrime and gcd. Assemble your code for those procedures into something that your processor can run. Put that code in the test_asm folder in your git repo.
- Add, commit, and push your assembly (.asm file) and the assembled code (.txt file).
Make another copy of the testProgramA() task in the test bench and edit the copy to run your relPrime program.
- Use the task's argument (n) as the initial argument for relPrime, and the expected argument for the expected output.
On the worksheet, explain how you plan to test that relPrime works; specifically, how will pass the input argument to your program from the test bench, and how will your test bench know when the program has finished running (so it can check the result)?
- There are many ways to do this; think about the Input/Output lecture from class for a few ideas, or think about how you could tell that the program is done by inspecting a register or the PC.
Test your relPrime program on your processor with many inputs, including at least these three:
- relPrime(6) = 5
- relPrime(5040) = 11
- relPrime(30030) = 17

7 Design a new instruction

Your last task is to design and implement a new instruction and implement it in your pipeline. You need to provide clear documentation for how it will work, and justify it's inclusion in the instruction set.

As you plan your design you should consider inventing an instruction that makes relprime run faster (this generally would combine multiple instructions into one new instruction).

Document the design (in the practical worksheet +10 pts) and explain how you plan to add it to the pipeline.
- maybe add a stage to support extra work
- or stall the pipeline
- or add more hardware to existing stages
Explain how you expect the new instruction to impact the performance of your processor.
Implement your design.
Run relprime with your new instruction (you'll have to rewrite relprime - make sure you keep both versions in your repository.)
Compare the two runtimes (number of cycles for each run)

8 Bonus - add MMIO

We discussed I/O in class, one way of implementing I/O is Memory Mapped I/O. For an extra points on this practical you can implement MMIO. You will need to write a test bench to show this works. If you do this you need to do the following:

Add a datapath drawing to the worksheet which shows the modifications for MMIO.
Put a Test Plan (following the format from previous practicals) together to show that I/O works.
Include a clear screenshot of a waveform in your worksheet that shows that the IO succeeded. You should annotate this waveform to indicate key events (e.g. point an arrow at a signal when an input number gets into a register.)

Full credit will only be awarded if you communicate how this works sufficiently in your worksheet. The graders will not look at your code for this problem.

This is a challenge problem, there is less support for this, you are expected to take ownership if you want to complete this challenge.

Turn It In

Grading Rubric

General Requirements for all Practicals:

The solution fits the need
Aspects of performance are discussed
The solution is tested for correctness
The submission shows iteration and documentation

Fill out the Practical Worksheet

In the worksheet, explain how you satisfy each of these items. Some guidelines:

Practical 9 Rubric items	Possible Points
Practical Worksheet	60
Write-Then-Read RegFile	5
Forwarding unit	10
lw stall (Hazard Unit)	10
branch/jump flushing	5
New Instruction (impl)	10
Extra points (MMIO)	10
Total out of	100

Submit your completed Practical Worksheet to gradescope.
Practical code will be submitted to your D git repository as new files and committed modifications to the repo we provided you. You must include your name and your teammates' names in a comment at the top of all files you submit.