Name: Date:

HW16 solution

We've seen the lw_inc rd, imm(rs1) instruction on previous homework, this I-type instruction loads a word from memory into rd while also incrementing the number in rs1 by 4. We now want to add this instruction to our pipelined processor.

  1. (9 points) Design a way to add this instruction to the pipelined datapath from class (we'll work from the one with branch logic in the Mem stage, see the images on the next page). There are three main approaches to adding instructions to a pipelined processor: A) add new hardware to existing stages, B) add new stages, C) add new control to reuse exisiting stages (i.e. stalls)

    a. (3 points) Explain how approach A could be used to add this instruction to the processor in just a few sentences. (Give the high-level overview here, you do not need to name wires, list bus widths, etc.)

    We could add another adder for the +4 to the X or M stages, alternatively we could add a couple muxes to use the branch/jump target adder to do the +4. We'll need a second set of write ports on the register file for writing to rs1.

    b. (3 points) Explain how approach B could be used to add this instruction to the processor.

    We could add a second eXecute stage immediately after the mem stage, for a second step of math operations. We'll need a second set of write ports on the register file for writing to rs1.

    c. (3 points) Explain how approach C could be used to add this instruction to the processor.

    We could stall the processor when lw_inc is in the X stage, the first cycle it would do the base+offset math, the second cycle it could do rs1 + 4. We can let the instruction do the lw part in the M stage while the increment part is happenening in the X stage. The instruction will then pass through the WB stage twice, and each time can write one of the two registers. Meaning the register file does not need to be updated. This will make control a little more complicated as we will have two sets of X/M/WB signals to pass forward through the pipe.

  1. (10 points) Choose one of the three approaches you described above, add the instruction to the following data path using that approach. Indicate your chosen approach above the datapath. If you are adding a new stage clearly indicate which of the existing stages it goes between, draw it neatly below the datapath, and draw the pipeline stage registers before and after it, indicating all I/O ports that are relevant. Note that this datapath has the branch logic in the Mem stage. You do not need to worry about hazards caused by other instructions as you implement this, just worry about the "normal" execution of the instruction.

    Solution depends on the chosen implementation. The solution should implement the new instruction, while not breaking any of the other core instructions.

    Chosen approach:

  1. (10 points) Choose a different one of the three approaches you described above, add the instruction to the following data path using that approach. Indicate your chosen approach above the datapath. See the note above about new stages.

    Chosen approach:

  1. (10 points) Which approach of the two you designed is better for this instruction, why? You should discuss performance implications in your justification. You may also want to consider data forwarding and other hazards that your design could cause in your discussion.

    Each approach has benefits and drawbacks, your answer is graded based on the contrast you make.

    A) Adds more hardware we have to pay for (either adder or some muxes, and control signals for them), we also have to modify the register file and add extra writing capability. Should not impact performance, cycle time should not be affected if we do the incrementing in parallel with some other stage.

    B) This is expensive, because we add a new stage register and an ALU. Very easy changes to control, the only edit will be a single new control signal will be "RegWrite2" for use on the second register file writing port. This will not impact performance, cycle time is still limited by the stages with memory, and one instruction will finish every cycle. This allows us flexibility to expand to more complex instructions that do math after accessing memory.

    C) This change will make control more complicated, we'll need a way of sending control bits for both uses of X/M/WB as the instruction goes. We also will need to make the hazard detection unit stall appropriately for the instruction. However, there are no new major pieces of hardware needed. This is a very cheap solution. This will impact performance, because every time a lw_inc instruction runs the processor takes 2x as long to finish that instruction. If this is a rare instruction, this may be worth it. It is also worth noting that this instruction will potentially cause forwarding of both the rd and rs1 data, so the forwarding unit will need to monitor the rs1 register as if it was an rd.