next up previous
Next: Bibliography Up: III. SELECTED APPLICATIONS Previous: 8.4 Direct Simulation Monte


Appendix

Machine Errors

Typically, in a computer real numbers are stored as follows:
$\textstyle \parbox{30pt}{\mbox{} [18pt]}$$\textstyle \parbox{324pt}{
\begin{tabular}[b]{\vert l\vert l\vert l\vert} \hlin...
... & e (exponent; $8$ bits) & m (mantissa; $23$ bits) \\
\hline
\end{tabular}}$

or, in a more usual notation,

\begin{displaymath}
x=\pm   m   \cdot   2^{\textstyle e-e_{0}}
\end{displaymath}



EXAMPLE: With a bias of $e_{0}=151$ (and keeping the high-end bit of the mantissa) the internal representation of the number $0.25$ is, using $1/4=(1\cdot2^{22})\cdot2^{-24}$ and $-24+151=127$,

$\textstyle \parbox{30pt}{\mbox{}}$$\textstyle \parbox{30pt}{\begin{displaymath}\frac{1}{4}= \end{displaymath}}$$\textstyle \parbox{294pt}{
\begin{tabular}[b]{\vert l\vert l\vert l\vert}
\hline
$+$ & 127 & 1 0 0 \dots 0 0 \\
\hline
\end{tabular} [3pt]}$
Before any addition or subtraction the exponents of the two arguments must be equalized; to this end the smaller exponent is increased, and the respective mantissa is right-shifted (decreased). All bits of the mantissa that are thus being ``expelled'' at the right end are lost for the accuracy of the result. The resulting error is called roundoff error. By machine accuracy we denote the smallest number that, when added to $1.0$, produces a result $\neq 1.0$. In the above example the number $2^{-22} \equiv 2.38 \cdot 10^{-7}$, when added to $1.0$, would just produce a result $\neq 1.0$, while the next smaller representable number $2^{-23} \equiv 1.19 \cdot 10^{-7}$ would leave not a rack behind:

$\textstyle \parbox{30pt}{\mbox{}}$$\textstyle \parbox{30pt}{\begin{displaymath}\;1.0 \end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...ert}
\hline
$+$ & 129 & 1 0 0 \dots 0 0 \\
\hline
\end{tabular} [6pt]}$
$\textstyle \parbox{30pt}{\mbox{}}$$\textstyle \parbox{30pt}{\begin{displaymath}+2^{-22} \end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...ert}
\hline
$+$ & 107 & 1 0 0 \dots 0 0 \\
\hline
\end{tabular} [6pt]}$
$\textstyle \parbox{30pt}{\mbox{}}$$\textstyle \parbox{30pt}{\begin{displaymath}=\;\;\;\; \end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...ert}
\hline
$+$ & 129 & 1 0 0 \dots 0 1 \\
\hline
\end{tabular} [6pt]}$
but

$\textstyle \parbox{30pt}{\mbox{}}$$\textstyle \parbox{30pt}{\begin{displaymath}\;1.0 \end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...ert}
\hline
$+$ & 129 & 1 0 0 \dots 0 0 \\
\hline
\end{tabular} [6pt]}$
$\textstyle \parbox{30pt}{\mbox{}}$$\textstyle \parbox{30pt}{\begin{displaymath}+2^{-23} \end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...ert}
\hline
$+$ & 106 & 1 0 0 \dots 0 0 \\
\hline
\end{tabular} [6pt]}$
$\textstyle \parbox{30pt}{\mbox{}}$$\textstyle \parbox{30pt}{\begin{displaymath}=\;\;\;\; \end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...ert}
\hline
$+$ & 129 & 1 0 0 \dots 0 0 \\
\hline
\end{tabular} [6pt]}$
A particularly dangerous situation arises when two almost equal numbers have to be subtracted. Such a case is depicted in Figure 8.4.
$\textstyle \parbox{24pt}{\mbox{}}$$\textstyle \parbox{24pt}{\mbox{}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...rt}
\hline
$+$ & 35 & 1 1 1 \dots 1 1 1\\
\hline
\end{tabular} [6pt]}$
$\textstyle \parbox{24pt}{\mbox{}}$$\textstyle \parbox{24pt}{\begin{displaymath}-\;\;\;\;\end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...t}
\hline
$+$ & 35 & 1 1 1 \dots 1 1 0 \\
\hline
\end{tabular} [6pt]}$
$\textstyle \parbox{24pt}{\mbox{}}$$\textstyle \parbox{24pt}{\begin{displaymath}=\;\;\;\; \end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...t}
\hline
$+$ & 35 & 0 0 0 \dots 0 0 1 \\
\hline
\end{tabular} [6pt]}$
$\textstyle \parbox{24pt}{\mbox{}}$$\textstyle \parbox{24pt}{\begin{displaymath}=\;\;\;\; \end{displaymath}}$$\textstyle \parbox{294pt}{\mbox{}\vspace{-12pt}
\begin{tabular}[b]{\vert l\vert...
...t}
\hline
$+$ & 14 & 1 0 0 \dots 0 0 0 \\
\hline
\end{tabular} [6pt]}$

Subtraction of two almost equal numbers
In the last (normalization) step the mantissa is arbitrarily filled up by zeros; the uncertainty of the result is $50\%$.

There is an everyday task in which such small differences may arise: solving the quadratic equation $ax^{2}+bx+c=0$. The usual formula

\begin{displaymath}
x_{1,2}=\frac{-b\pm\sqrt{b^{2}-4ac}}{2a}
\end{displaymath} (8.91)

will yield inaccurate results whenever $ac«b^{2}$. Since in writing a program one must always provide for the worst possible case, it is recommended to use the equivalent but less error-prone formula
\begin{displaymath}
x_{1}=\frac{q}{a} ,\hspace{60pt}x_{2}=\frac{c}{q} \\
\end{displaymath} (8.92)

with
\begin{displaymath}
q\equiv - \frac{1}{2}\left[ b+sgn(b) \sqrt{b^{2}-4ac} \right]
\end{displaymath} (8.93)



EXERCISE: Assess the machine accuracy of your computer by trying various negative powers of $2$, each time adding and subtracting the number $1.0$ and checking whether the result is zero.

Discrete Fourier Transformation
FundamentalsWe are using the convention
\begin{displaymath}
\tilde{f}(\nu)=\int \limits_{-\infty}^{\infty}f(t) e^{2\pi\...
...ts_{-\infty}^{\infty}\tilde{f}(\nu) e^{-2\pi i \nu t}  d\nu
\end{displaymath} (8.94)

Assume that the function $f(t)$ is given only at discrete, equidistant values of its argument:
\begin{displaymath}
f_{k}\equiv f(t_{k})\equiv f(k  \Delta t) \;\;\;k=\dots -2, -1,0,1,2, \dots
\end{displaymath} (8.95)

The reciprocal value of the time increment $\Delta t$ is called sampling rate. The higher the sampling rate, the more details of the given function $f(t)$ will be captured by the table of discrete values $f_{k}$. This intuitively evident fact is put in quantitative terms by Nyquist's theorem: if the Fourier spectrum of $f(t)$,
\begin{displaymath}
\tilde{f}(\nu)\equiv \int \limits_{-\infty}^{\infty}f(t)
e^{\textstyle 2\pi i\nu t} dt
\end{displaymath} (8.96)

is negligible for frequencies beyond the critical (or Nyquist) frequency
\begin{displaymath}
\pm \nu_{0} \equiv \pm \frac{1}{2 \Delta t}
\end{displaymath} (8.97)

then $f(t)$ is called a band-limited process. Such a process is completely determined by its sampled values $f_{k}$. The formula that permits the reconstruction of $f(t)$ from the sampled data reads
\begin{displaymath}
f(t)=\sum_{k=-\infty}^{\infty}f_{k}
\frac{\sin[2\pi \nu_{0}(t-k\Delta t)]}{2\pi \nu_{0}(t-k\Delta t)}
\end{displaymath} (8.98)

(In contrast, if $f(t)$ is not band-limited, sampling with finite time resolution results in ``mirroring in'' the outlying parts of the spectrum from beyond $\pm \nu_{0}$, superposing them on the correct spectrum. In signal processing this effect is known as ``aliasing''.)

Let us assume now that a finite set of sampled values is given:

\begin{displaymath}
f_{k} ,\;\;k=0,1, \dots N-1
\end{displaymath} (8.99)

and let $N$ be an even number. Define discrete frequencies by
\begin{displaymath}
\nu_{n}\equiv\frac{n}{N\Delta t} ,\;\;\;
n=-\frac{N}{2},\dots ,0, \dots ,\frac{N}{2}
\end{displaymath} (8.100)

(The $\nu_{n}$ pertaining to $n=N/2$ is again the Nyquist frequency.) Then the Fourier transform of $f(t)$ at some frequency $\nu_{n}$ is given by
\begin{displaymath}
\tilde{f}(\nu_{n}) \approx \Delta t \sum_{k=0}^{N-1}f_{k}
e^...
...k}} =
\Delta t \sum_{k=0}^{N-1}f_{k}e^{\textstyle 2\pi i kn/N}
\end{displaymath} (8.101)

Thus it makes sense to define the discrete Fourier transform as

\fbox{\begin{minipage}{393pt} \mbox{} [-24pt]
\begin{equation}
F_{n} \equiv \s...
...ation}\hspace{60pt}with $N$ even, and $n=0, \pm 1, \dots, N/2$\end{minipage}}

According to 8.101 the Fourier transform proper is just $\tilde{f}(\nu_{n}) \approx \Delta t   F_{n}$.

From the definition of $F_{n}$ it follows that $F_{-n}=F_{N-n}$. We make use of this periodicity to renumber the $F_{n}$ such that $n$ runs from $0$ to $N-1$ (instead of $-N/2$ to $N/2$):


          

$-\frac{N}{2}$, $-\frac{N}{2}+1$, $\dots$ $0$, $\dots$ $\frac{N}{2}-1$, $\frac{N}{2}$, $-\frac{N}{2}+1$, $\dots$ $-1$

$\Longrightarrow$ $0$, $\dots$ $\frac{N}{2}-1$, $\pm\frac{N}{2}$, $\frac{N}{2}+1$, $\dots$ $N-1$




With this indexing convention the back transformation may be conveniently written
\begin{displaymath}
f_{k} =\frac{1}{N} \sum_{n=0}^{N-1}F_{n}e^{\textstyle -2\pi ikn/N}
\end{displaymath} (8.102)


Fast Fourier Transform (FFT)If we were to use the definition 8.102 ``as is'' to calculate the discrete Fourier transform, we would have to perform some $N^{2}$ operations. Cooley and Tukey (and before them Danielson and Lanczos; see [PRESS 86]) have demonstrated how, by smart handling of data, the number of operations may be pushed down to $\approx N  \log_{2}N$. Note that for $N=1000$ this is an acceleration of $100:1$. Indeed, many algorithms of modern computational physics hinge on this possibility of rapidly transforming back and forth long tables of function values.

In the following it is always assumed that $N=2^{m}$. If $N$ is not a power of $2$, simply ``pad'' the table, putting $f_{k}=0$ up to the next useful table length. Defining

\begin{displaymath}
W_{N}\equiv e^{\textstyle 2\pi i/N}
\end{displaymath} (8.103)

we realize that $W_{N}^{2}=W_{N/2}$ etc. The discrete Fourier transform is therefore
$\displaystyle F_{N}$ $\textstyle =$ $\displaystyle \sum_{k=0}^{N-1}W_{N}^{nk}f_{k}$ (8.104)
  $\textstyle =$ $\displaystyle \sum_{l=0}^{N/2-1}W_{N/2}^{nl}f_{2l}
+W_{N}^{n}\sum_{l=0}^{N/2-1}W_{N/2}^{nl}f_{2l+1}$ (8.105)
  $\textstyle \equiv$ $\displaystyle \quad F_{n}^{e}\quad+\quad W_{N}^{n} F_{n}^{o}$ (8.106)

where the indices e and o stand for ``even'' and ``odd''. Next we treat each of the two terms to the right of 8.107 by the same pattern, finding
$\displaystyle F_{n}^{e}$ $\textstyle =$ $\displaystyle F_{n}^{ee}+W_{N/2}^{n}F_{n}^{eo}$ (8.107)
$\displaystyle F_{n}^{o}$ $\textstyle =$ $\displaystyle F_{n}^{oe}+W_{N/2}^{n}F_{n}^{oo}$ (8.108)

By iterating this procedure $m=\log_{2} N$ times we finally arrive at terms $F_{n}^{(...)}$ that are identical to the given table values $f_{k}$.

EXAMPLE: Putting $N=4$ we have $W_{4} \equiv exp[2\pi i/4]$ and
$\displaystyle F_{n}$ $\textstyle =$ $\displaystyle \sum_{k=0}^{3}W_{4}^{nk}f_{k} \;\;n=0,\dots 3$ (8.109)
  $\textstyle =$ $\displaystyle \sum_{l=0}^{1}W_{2}^{nl}f_{2l}+W_{4}^{n}\sum_{l=0}^{1}W_{2}^{nl}f_{2l+1}$ (8.110)
  $\textstyle \equiv$ $\displaystyle \hspace{2em}F_{n}^{e}\;\;\;+\;\;\;W_{4}^{n}F_{n}^{o}$ (8.111)
  $\textstyle =$ $\displaystyle F_{n}^{ee}+W_{2}^{n}F_{n}^{eo}
+W_{4}^{n}\left[ F_{n}^{oe}+W_{2}^{n}F_{n}^{oo}\right]$ (8.112)
  $\textstyle =$ $\displaystyle f_{0}+W_{2}^{n}f_{2} +W_{4}^{n}
\left[f_{1}+W_{2}^{n}f_{3} \right]$ (8.113)

Thus the correspondence between the table values $f_{k}$ and the terms $F_{n}^{ee}$ etc. is as follows:

     

$ee$ $eo$ $oe$ $oo$
$\longrightarrow$ $0$ $2$ $1$ $3$



EXERCISE: Demonstrate that a similar analysis as above leads for $N=8$ to the correspondences

         

$eee$ $eeo$ $eoe$ $eoo$ $oee$ $oeo$ $ooe$ $ooo$
$\longrightarrow$ $0$ $4$ $2$ $6$ $1$ $5$ $3$ $7$

It is easy to see that this correspondence is reproduced by the following rule: 1) put $e\leftrightarrow0$ and $o\leftrightarrow 1$, such that $eeo \leftrightarrow 001$ etc.; 2) reverse the bit pattern thus obtained and interpret the result as an integer number: $eeo \leftrightarrow 001 \leftrightarrow 100  = 4$. In other words, arrange the table values $f_{k}$ in bit-reversed order. (For example, $k=4$ is at position $1$ since $4=100 \longrightarrow 001=1$.)

The correctly arranged $f_{k}$ are now combined in pairs according to 8.114. The rule to follow in performing this ``decimation'' step is sketched in Fig. 8.4.


         

$0$ $4$ $2$ $6$ $1$ $5$ $3$ $7$
$m=1$ $a$ $b$ $a$ $b$ $a$ $b$ $a$ $b$
$\textstyle \parbox{48pt}{\mbox{} *[-24pt]}$$\textstyle \parbox{300pt}{\mbox{} *[-24pt]$\underbrace{\mbox{\hspace{36 pt}}}...
...race{\mbox{\hspace{36 pt}}}$\hspace{36 pt}$\underbrace{\mbox{\hspace{36 pt}}}$}$
$m=2$ $a$ $b$ $a$ $b$
$\textstyle \parbox{72pt}{\mbox{} *[-24pt]}$$\textstyle \parbox{300pt}{\mbox{} *[-24pt]$\underbrace{\mbox{\hspace{72pt}}}$\hspace{72pt}$\underbrace{\mbox{\hspace{72pt}}}$}$
$m=3$ $a$ $b$


Decimation for $N=8$
On each level ($m$) the terms $a,b$ are combined according to
\begin{displaymath}
a+W_{2^{m}}^{n}b
\end{displaymath} (8.114)

It is evident that the number of operations is of order $N   \log_{2}N$.

Further details of the method, plus sample programs in Pascal, Fortran, or C are given in [PRESS 86].

EXERCISE: Sketch the pattern of Figure 8.4 for $N=4$ and perform the ``decimation''. Compare your result to equ. 8.114.


next up previous
Next: Bibliography Up: III. SELECTED APPLICATIONS Previous: 8.4 Direct Simulation Monte
Franz J. Vesely Oct 2005
See also:
"Computational Physics - An Introduction," Kluwer-Plenum 2001