Computer Architecture I @ ShanghaiTech University
Getting Started
Download the files for lab8 first.
In this lab, you will build an 8-input signed adder tree in Logisim Evolution - first as a pure combinational circuit (Exercise 1), then pipelined at every level (Exercise 2). The tree reduces eight narrow samples into one 11-bit sum, which is then sign-extended to the native XLEN = 64 register width of an RV64I machine and written into a sum register - the same convention that lb / lh / mulw / addw use when writing a narrow result back into a 64-bit GPR. It is also your first taste of the latency vs. throughput tradeoff that shows up everywhere in Project 2.
Tip.
addw/mulware RV64-only instructions. They are mentioned here, together withlb/lh, only as examples of the same pattern: a narrow signed result is sign-extended before being written back to a full-width register.In this lab, the adder tree output is an 11-bit signed value, so it must be sign-extended to the 64-bit
sumregister.
Exercises
Exercise 1: Non-pipelined Adder Tree
Load the given starter file ex1.circ into Logisim and implement an 8-input signed adder tree as a purely combinational block feeding a single output register. Show this completed circuit to your TA (remember to save!).
Functionality
Parallel Summation: The circuit takes 8 signed 8-bit operands \(S_0, S_1, \ldots, S_7\) at the same time and produces the signed sum
\(P = \sum_{i=0}^{7} S_i\)
using a balanced binary adder tree (4 adders at level 1, 2 adders at level 2, 1 adder at level 3).
Width Growth: To avoid overflow, each level must widen its operands by 1 bit before adding. The tree output is therefore an 11-bit signed value, since \(8 + \log_2 8 = 11\).
Sign Extension to XLEN = 64 bits: The 11-bit tree output is sign-extended to 64 bits before being written into the
sumregister. Think of the adder tree as a narrow functional unit sitting inside a 64-bit datapath: its result must be promoted to the full XLEN = 64 register width before write-back, exactly like theaddw/mulwfamily does in RV64I. Zero extension would silently corrupt negative sums oncesumis consumed by a signed 64-bit ALU.Registered Output Only: There are no pipeline registers inside the tree. On every rising edge of
clk, the current combinational sum is latched into the 64-bitsumregister.
Input Ports
- Operands (
S0-S7): Eight signed 8-bit inputs supplied via a singleSbus and a splitter, as shown in the starter file. - Clock Signal (
clk): Drives the 64-bit output register. On every rising edge the register captures the current sign-extended sum.
Output Ports
- Sum (
sum): A 64-bit signed value holding the most recently latched tree output.
Details
- You must use the
Addercomponent from theArithmeticlibrary. Do not useBit Adder, and do not collapse the 7 adders into fewer, wider ones - we are explicitly practicing the tree topology. - At every level, widen both operands by 1 bit using a Bit Extender (
Arithmetic -> Bit Extender, type =sign). The three levels therefore use adders of width 9, 10, 11. - Use a Bit Extender with type =
signand output width64to extend the 11-bit tree output to 64 bits before the register. - The register must be 64-bit wide. Leave the enable pin tied high (or unconnected - Logisim defaults to enabled) and keep the reset pin tied low unless the starter file says otherwise.
- Use a
Tunnelforclk(and any other signals you want to keep from crossing the schematic) to keep things readable. Your final layout should look roughly like this:
11 -> 64"] SX --> REG["sum
64b register"] REG --> OUT["sum"]
Check-off
- Show
ex1.circto your TA, and explain:- Why each level of the tree needs one more bit of width than the previous level.
- Why sign extension (not zero extension) is used on the 11-bit tree output.
- What the critical path of this circuit is, in terms of number of adders in series.
- Advance the clock one tick at a time with
Ctrl+TorF9- do not useCtrl+K(Ticks Enabled) - and set a few operand combinations with the Poke tool. After a single tick,sumshould already reflect the current inputs. Tell your TA what this implies for the maximum clock frequency compared to Exercise 2.
Exercise 2: Pipelined Adder Tree
Load the given starter file ex2.circ into Logisim. Starting from your Exercise 1 circuit, insert pipeline registers so that every adder level is followed by a register, turning the tree into a 3-stage pipeline that also feeds the 64-bit sum register. Show this completed circuit to your TA (remember to save!).
Functionality
Pipelined Summation: The circuit computes the same signed sum
\(P = \sum_{i=0}^{7} S_i\)
but breaks the combinational path into 3 pipeline stages, one per tree level, plus the existing output register. The final sign-extended result appears in
sum4 cycles after its inputs are applied.Throughput = 1 sample / cycle: On every rising edge the circuit accepts a new set of 8 operands and, once the pipeline is full, produces one valid
sumper cycle - matching the instruction throughput you would expect from a fully pipelined functional unit in a 64-bit datapath.Latency = 4 cycles: 3 cycles through the tree stages + 1 cycle into the output register. Stage widths are 9, 10, 11 bits respectively, matching Exercise 1.
Uniform Clock Domain: All pipeline registers share the same
clk. There is no per-stage enable - data moves forward every cycle, just like a textbook 5-stage RV64 pipeline.
Input Ports
- Operands (
S0-S7): Eight signed 8-bit inputs, identical to Exercise 1. - Clock Signal (
clk): Drives all pipeline registers and the output register via a singleclktunnel, so every stage shares one synchronous clock domain.
Output Ports
- Sum (
sum): The 64-bit signed sum corresponding to the operands applied 4 cycles earlier.
Details
- Insert registers at the following points (see diagram below):
- After level 1: 4 x 9-bit registers (
R1A,R1B,R1C,R1D). - After level 2: 2 x 10-bit registers (
R2A,R2B). - After level 3: 1 x 11-bit register (
R3). - The existing 64-bit
sumregister stays as the final stage.
- After level 1: 4 x 9-bit registers (
- Every register must be driven by the same
clktunnel. Do not introduce separate clocks - in this course, all pipeline stages are synchronous to a single clock domain. - The sign extender (11 -> 64) sits between
R3and thesumregister, exactly as in Exercise 1. Do not widen pipeline registers to 64 bits - that wastes area without changing behavior. - Do not reorder the tree or merge levels. The whole point of this exercise is that each pipeline stage contains exactly one adder delay.
11 -> 64"] --> REG["sum | 64b"] end R1A --> B1 R1B --> B1 R1C --> B2 R1D --> B2 R2A --> C1 R2B --> C1 R3 --> SX
Check-off
- Show
ex2.circto your TA, and explain:- The latency (in cycles) from applying
S0-S7to observing the corresponding value onsum. - The throughput of the circuit, and why inserting registers does not reduce it.
- The new critical path, and why this lets a pipelined version run at a higher clock frequency than Exercise 1 even though both compute the same function.
- The latency (in cycles) from applying
- Drive the circuit with a changing input bus, advancing the clock one tick at a time with
Ctrl+TorF9(do not useCtrl+K). Verify with your TA that:- The first valid
sumappears exactly 4 cycles after the first valid input - i.e. the pipelined version's outputs are shifted by 4 cycles relative to the non-pipelined version in Exercise 1. - Stopping the clock mid-computation freezes all intermediate register values, and resuming produces the correct result without any data loss.
- The first valid
The following TA(s) are responsible for this lab: Chaofan Li <lichf2025@shanghaitech.edu.cn>