Hw2

Computer Architecture I (CS110 / CS110P) Document
Reference: CS110 Course Page

Homework 2: FP16 Multiplication Simulator

Computer Architecture - CS110 @ ShanghaiTech University

Overview

In this homework, you will implement a software simulator for IEEE 754 half-precision floating-point (FP16) multiplication. Instead of merely computing the final arithmetic result, your program is required to accurately trace and output the intermediate states of the hardware multiplication pipeline.

Through this assignment, you are expected to:

  • Deeply understand the bit-level representation of IEEE 754 floating-point numbers.
  • Practice extracting and manipulating mantissas, including the proper handling of implicit hidden bits.
  • Implement fractional multiplication and the dynamic bit-shifting required for result normalization.
  • Comprehend and enforce the strict Round-to-+∞\infty (Up) rounding mode, specifically handling its counter-intuitive behaviors when applied to negative numbers. (Note: The default rounding mode is typically Round-to-Nearest-Ties-to-Even, but for simplicity of implementation, we choose Round-to-±∞\pm\infty.)
  • Correctly manage complex edge cases, including Underflow, Overflows, NaNs, and Infinities.

Before you start, you have to join the GitHub Classroom and accept the assignment via this link. A repository containing the starter code will be generated to you. You can then start on the assignment by cloning it to your Ubuntu system. To submit the assignment, you should push your local clone to GitHub and turn in your work on Gradescope by connecting to your GitHub repo.

Important Restriction: You must simulate the hardware logic using purely unsigned/signed integer types (e.g., uint16_t, uint32_t) and bitwise operations. Type-casting to float or double, or using standard library math/rounding functions is STRICTLY FORBIDDEN. Your code will be audited both automatically and manually.

Input & Output Specification

The C program should read from stdin line by line until EOF is reached. Each line contains two FP16 floating-point numbers expressed in 4-digit lowercase hex. For each pair of operands, your program must output exactly 5 lines detailing the pipeline steps.

Floating-Point Format (FP16)

The format comprises 16 bits: 1 sign bit, 5 exponent bits, and 10 explicit mantissa (fraction) bits. The exponent bias is 15.

 MSB                                LSB
+----+----------------+---------------+
| 15 | 14   ...    10 | 9    ...    0 |
+----+----------------+---------------+
 Sign      Exponent       Mantissa
Exponent Mantissa = 0 Mantissa != 0
0x00 ±\pm zero subnormal
0x1F ±\pm infinity NaN
otherwise normal normal

Rounding Mode: Round-to-+∞\infty (Up)

The objective of this rounding mode is to round the result to the smallest representable floating-point number that is greater than or equal to the exact mathematical result.

When truncating the intermediate 22-bit product to fit the 10-bit target fraction, the discarded low-order bits are summarized by three bits to determine the loss of precision:

  • G (Guard bit): The first truncated bit immediately to the right of the target fraction's least significant bit.
  • R (Round bit): The bit immediately to the right of the Guard bit.
  • S (Sticky bit): The logical OR of all remaining truncated bits to the right of the Round bit.

Let Inexact=G∨R∨SInexact = G \lor R \lor S. Note that the Sticky bit (SS) must represent the logical OR of all discarded bits. This includes not only the lower bits of the raw 22-bit product, but crucially, any 1 bits that are shifted out during normalization or underflow right-shifts. Therefore, Inexact==1Inexact == 1 if any precision is lost at any step.

  • If Inexact==0Inexact == 0: The result is exact. Truncate GRS bits.
  • If Inexact==1Inexact == 1 AND Sign==0Sign == 0 (Positive): The exact mathematical result is greater than the truncated result. You must round up (add 1 to the lowest bit of the mantissa).
  • If Inexact==1Inexact == 1 AND Sign==1Sign == 1 (Negative): The exact mathematical result (e.g., −1.55-1.55) is greater (closer to +∞+\infty) than the rounded-down magnitude (e.g., −1.6-1.6). Therefore, you must truncate (do nothing to the mantissa and discard GRS).

Overflow Rules:

  • Positive Overflow: Rounds to Positive Infinity (+Inf).
  • Negative Overflow: Rounds to the Maximum Negative Normal Number (-MaxNormal, which is 0xFBFF), NOT -Inf.

Output Format Details

Line 1 & 2: Unpack Op[1|2]: S=<sign> E=<unbiased_exp> M=<implicit>.<fraction_3hex> <type>

  • <sign>: The sign bit (0 or 1).
  • <unbiased_exp>: The actual exponent value in decimal (e.g., 0, -10). For zeros and subnormals, output the implicit exponent -14. For Inf/NaN, output 0.
  • <implicit>: The hidden leading bit shown in the unpack trace. 1 for normal/inf/nan, 0 for subnormal/zero.
  • <fraction_3hex>: The 10-bit explicit fraction left-adjusted to 12 bits, formatted as 3 lowercase hex digits. (e.g., fraction 0000000001 becomes 004).
  • <type>: Must be one of "normal", "subnormal", "zero", "inf", "nan".,

Line 3: Raw Product Raw: <22_bit_binary> E_raw=<raw_exp>

  • <22_bit_binary>: The pure integer multiplication of the two 11-bit mantissas (including hidden bits).
  • <raw_exp>: The decimal sum of the two unbiased exponents.

Line 4: Normalize & Round Norm: E_norm=<e_norm> Fraction=<10_bit_binary> G=<g> R=<r> S=<s> Action=<Up|Truncate>

  • <e_norm>: The decimal unbiased exponent after normalization or underflow alignment, but before rounding logic is applied.
  • <10_bit_binary>: The 10-bit fraction before rounding.
  • <g>, <r>, <s>: The computed Guard, Round, and Sticky bits (0 or 1).
  • <Action>: Determine the rounding action (Up or Truncate).

Note: If a special value(nan, inf) bypasses Raw/Norm steps, output Raw: N/A and Norm: N/A for those lines.

Special invalid operation rule: 0 × inf (in either operand order, with any sign) must produce nan.

Line 5: Final Result Result: <4_digit_hex_lower_case>

Note: If nan bypasses Result steps, output Result: N/A for those lines.

An Example Run

Input:

3e00 3e00
bc01 3c01
1400 1400

Output:

Op1: S=0 E=0 M=1.800 normal
Op2: S=0 E=0 M=1.800 normal
Raw: 1001000000000000000000 E_raw=0
Norm: E_norm=1 Fraction=0010000000 G=0 R=0 S=0 Action=Truncate
Result: 4080
Op1: S=1 E=0 M=1.004 normal
Op2: S=0 E=0 M=1.004 normal
Raw: 0100000000100000000001 E_raw=0
Norm: E_norm=0 Fraction=0000000010 G=0 R=0 S=1 Action=Truncate
Result: bc02
Op1: S=0 E=-10 M=1.000 normal
Op2: S=0 E=-10 M=1.000 normal
Raw: 0100000000000000000000 E_raw=-20
Norm: E_norm=-14 Fraction=0000010000 G=0 R=0 S=0 Action=Truncate
Result: 0010

Compilation Standard

Your code will be evaluated using the following strict compilation flags:

gcc -std=c11 -Wall -Wextra -Wpedantic -Werror -O2 -ggdb fpmul.c -o fpmul

Warnings are treated as errors. No memory leaks are allowed.

Academic Integrity

You are expected to submit your own original work. Plagiarism, including but not limited to copying code from peers, downloading from online repositories, or submitting AI-generated code, is a serious academic offense and will not be tolerated. Penalties may include a failing grade for the course.