Floating point arithmetics
Integer arithmetics is fairly straightforward and most engineers understand it well. Floating pointer arithmetics - very different story.
Addition and subtraction
In order to perform addition and subtraction of fp numbers, we need to represent them with the same exponent first: $$\begin{align}2.5 \cdot 10^3 + 3.1 \cdot 10^2 = 2.5 \cdot 10^3 + 0.31 \cdot 10^3\end{align}$$
Once the numbers have the same exponends we can add significands: $$\begin{align}(2.5 + 0.31) \cdot 10^3 = 2.81 \cdot 10^3 \end{align}$$
In some cases, we may need to perform normalization to make sure we are still using standard scientific notation.
With above implementation, the problem arises if we were to add
drastically different numbers while having limited amount of digits. To
demonstrate that, let’s assume we have only 5 digits to store
significand.
Let’s add 2.345 * 105 + 1.312 * 10−2:
Step 1: shift
Step 2: add
Step 3: round and normalize
As you can see, after adding two numbers we lost precision. In fact, in this example we got a result that equals to the first number.
To alleviate this and other similar problems with fp arithmetics, guard bit, rounding bit and sticky bit are used. We will cover that another time.
Multiplication
In order to perform multiplication of fp numbers, we multiply theirs significands and add exponents. $$\begin{align}2.5 \cdot 10^3 * 3.1 \cdot 10^2 = (2.5 * 3.1) \cdot 10^{3+2} \to \end{align}$$ $$\begin{align} \to 7.75 \cdot 10^5 \end{align}$$
Another example:
Step 1: add exponents, multiply significants
Step 2: normalize
Step 3: round
Division
In order to perform division of fp numbers, we divide theirs significands and subtract exponents. $$\frac{4.2 \cdot 10^3}{2.1 \cdot 10^2} = \frac{4.2}{2.1} \cdot 10^{3-2} \to $$ $$\begin{align} \to 2 \cdot 10^1 \end{align}$$ Another example:
Step 1: subtract exponents, divide significants
Step 2: round and normalize
TODO: add diagrams
Sources:
University
of Maryland: Floating Point Arithmetic Unit by Dr A. P. Shanti
George Mason
University: Floating Point Arithmetic
Drexel
University: Systems Architecture Lecture: Floating Point
Arithmetic
Wikipedia:
Floating-point arithmetic
24 Mar 2022 - Hasan Al-Ammori