Floating point arithmetics

Integer arithmetics is fairly straightforward and most engineers understand it well. Floating pointer arithmetics - very different story.

Addition and subtraction

In order to perform addition and subtraction of fp numbers, we need to represent them with the same exponent first: $$\begin{align}2.5 \cdot 10^3 + 3.1 \cdot 10^2 = 2.5 \cdot 10^3 + 0.31 \cdot 10^3\end{align}$$

Once the numbers have the same exponends we can add significands: $$\begin{align}(2.5 + 0.31) \cdot 10^3 = 2.81 \cdot 10^3 \end{align}$$

In some cases, we may need to perform normalization to make sure we are still using standard scientific notation.

With above implementation, the problem arises if we were to add drastically different numbers while having limited amount of digits. To demonstrate that, let’s assume we have only 5 digits to store significand.
Let’s add 2.345 * 105 + 1.312 * 10−2:

e= 5  s=2.345
e=-2  s=1.312

Step 1: shift

e=5  s=2.3450000000
e=5  s=0.0000001312

Step 2: add

e=5  s=2.3450001312

Step 3: round and normalize

e=5  s=2.34500

As you can see, after adding two numbers we lost precision. In fact, in this example we got a result that equals to the first number.

To alleviate this and other similar problems with fp arithmetics, guard bit, rounding bit and sticky bit are used. We will cover that another time.


In order to perform multiplication of fp numbers, we multiply theirs significands and add exponents. $$\begin{align}2.5 \cdot 10^3 * 3.1 \cdot 10^2 = (2.5 * 3.1) \cdot 10^{3+2} \to \end{align}$$ $$\begin{align} \to 7.75 \cdot 10^5 \end{align}$$

Another example:

e= 5  s=2.345
e=-2  s=7.246

Step 1: add exponents, multiply significants

1) e=5 + (-2)  
2) s=2.345 * 7.246
e=3  s=16.99187

Step 2: normalize

e=4  s=1.699187

Step 3: round

e=4  s=1.69918


In order to perform division of fp numbers, we divide theirs significands and subtract exponents. $$\frac{4.2 \cdot 10^3}{2.1 \cdot 10^2} = \frac{4.2}{2.1} \cdot 10^{3-2} \to $$ $$\begin{align} \to 2 \cdot 10^1 \end{align}$$ Another example:

e= 5  s=8.2
e=-2  s=2.5

Step 1: subtract exponents, divide significants

1) e = 5 - (-2) = 7  
2) s = 8.2 / 2.5
e=7  s=3.28

Step 2: round and normalize

e=7  s=3.28

TODO: add diagrams


University of Maryland: Floating Point Arithmetic Unit by Dr A. P. Shanti
George Mason University: Floating Point Arithmetic
Drexel University: Systems Architecture Lecture: Floating Point Arithmetic
Wikipedia: Floating-point arithmetic

24 Mar 2022 - Hasan Al-Ammori