Franz J. Vesely > CompPhys Tutorial > Appendix

Warning: jsMath requires JavaScript to process the mathematics on this page.
If your browser supports JavaScript, be sure it is enabled.

Appendix A1

Appendix

A1 Machine Errors

Typically, in a computer real numbers are stored as follows:

$ \pm $

e (exponent; $8$ bits)

m (mantissa; $23$ bits)

or, in a more usual notation,

$ x=\pm \, m \, \cdot \, 2^{\textstyle e-e_{0}} $

The mantissa $m$ is normalized, i.e. shifted to the left as far as possible, such that there is a $1$ in the first position; each left-shift by one position makes the exponent $e$ smaller by $1$. (Since the leftmost bit of $m$ is then known to be $1$, it need not be stored at all, permitting one further left-shift and a corresponding gain in accuracy; $m$ then has an effective length of 24 bits.)
The bias $e_{0}$ is a fixed, machine-specific integer number to be added to the "actual" exponent $e-e_{0}$, such that the stored exponent $e$ remains positive.

EXAMPLE: With a bias of $e_{0}=151$ (and keeping the high-end bit of the mantissa) the internal representation of the number $0.25$ is, using $1/4=(1\cdot2^{22})\cdot2^{-24}$ and $-24+151=127$,

$\frac{\textstyle 1}{\textstyle 4}= $

$+$

$127$

$1 \; 0 \; 0 \; \dots \; 0 \; 0$

Before any addition or subtraction the exponents of the two arguments must be equalized; to this end the smaller exponent is increased, and the respective mantissa is right-shifted (decreased). All bits of the mantissa that are thus being "expelled" at the right end are lost for the accuracy of the result. The resulting error is called roundoff error. By machine accuracy we denote the smallest number that, when added to $1.0$, produces a result $\neq 1.0$. In the above example the number $2^{-22} \equiv 2.38 \cdot 10^{-7}$, when added to $1.0$, would just produce a result $\neq 1.0$, while the next smaller representable number $2^{-23} \equiv 1.19 \cdot 10^{-7}$ would leave not a rack behind:

$ 1.0 $

$ + $

$129$

$ 1 \; 0 \;0 \; \dots \; 0 \; 0$

$ +2^{-22}$

$ + $

$107$

$ 1 \; 0 \;0 \; \dots \; 0 \; 0$

$ =$

$ + $

$129$

$ 1 \; 0 \;0 \; \dots \; 0 \; 1$

but:

$ 1.0 $

$ + $

$129$

$ 1 \; 0 \;0 \; \dots \; 0 \; 0$

$ +2^{-23}$

$ + $

$106$

$ 1 \; 0 \;0 \; \dots \; 0 \; 0$

$ =$

$ + $

$129$

$ 1 \; 0 \;0 \; \dots \; 0 \; 0$

A particularly dangerous situation arises when two almost equal numbers have to be subtracted. For example:

$ + $

$35$

$ 1 \; 1 \; 1 \; \dots \; 1 \; 1 \; 1 $

$ -$

$ + $

$ 35 $

$ 1 \; 1 \; 1 \; \dots \; 1 \; 1 \; 0 $

$ =$

$ + $

$ 35 $

$ 0 \; 0 \;0 \; \dots \; 0 \; 0 \; 1$

$ =$

$ + $

$ 14 $

$ 1 \; 0 \;0 \; \dots \; 0 \; 0 \; 0$

Note that in the last (normalization) step the mantissa is arbitrarily filled up by zeros; the uncertainty of the result is $50\%$.

An important application:
There is an everyday task in which such small differences may arise: solving the quadratic equation $ax^{2}+bx+c=0$. The usual formula

$ x_{1,2} = \frac{\textstyle -b \pm \sqrt{b^{2}-4ac}}{\textstyle 2a} $ (8.91)

will yield inaccurate results whenever $ac << b^{2}$. Since in writing a program one must always provide for the worst possible case, it is recommended to use the equivalent but less error-prone formula

$ x_{1} = \frac{\textstyle q}{\textstyle a} \, , \;\;\; x_{2} = \frac{\textstyle c}{\textstyle q} $ (8.92)

with

$ q \equiv - \frac{\textstyle 1}{\textstyle 2} \left[ b+sgn(b)\,\sqrt{b^{2}-4ac}\,\right] $ (8.93)

EXERCISE: Assess the machine accuracy of your computer by trying various negative powers of $2$, each time adding and subtracting the number $1.0$ and checking whether the result is zero.

vesely 2006