Next: 1.4 Problems for Chapter Up: 1. Why is water Previous: 1.2 Model systems

1.3 Fundamentals of Statistics

By statistics we denote the investigation of regularities in apparently non-deterministic processes. An important basic quantity in this context is the ``relative frequency'' of an ``event''. Let us consider a repeatable experiment - say, the throwing of a die - which in each instance leads to one of several possible results

- say, $e \equiv number of points = 4$ . Now repeat this experiment

times under equal conditions and register the number of cases in which the specific result

occurs; call this number $f_{n}(e)$ . The relative frequency of

is then defined as

$\equiv f_{n}(e)/n$ .

Following R. von Mises we denote as the ``probability '' of an event the expected value of the relative frequency in the limit of infinitely many experiments:

$\begin{displaymath} {\cal P}(e) \equiv \lim_{n \rightarrow \infty} \frac{f_{n}(e)}{n} \end{displaymath}$

(1.24)

EXAMPLE: Game die; 100-1000 trials; $e \equiv \{no. of points = 6\}$ , or $e \equiv \{no. of points$ $\leq 3\}$ .

Now, this definition does not seem very helpful. It implies that we have already done some experiments to determine the relative frequency, and it tells us no more than that we should expect more or less the same relative frequencies when we go on repeating the trials. What we want, however, is a recipe for the prediction of ${\cal P}(e)$ .

To obtain such a recipe we have to reduce the event to so-called ``elementary events'' $\epsilon$ that obey the postulate of equal a priori probability. Since the probability of any particular one among possible elementary events is just ${\cal P}(\epsilon_{i}) = 1/K$ , we may then derive the probability of a compound event by applying the rules

$\displaystyle {\cal P}(e = \epsilon_{i} \; or \; \epsilon_{j})$	$\textstyle =$	$\displaystyle {\cal P}(\epsilon_{i}) +{\cal P}(\epsilon_{j})$	(1.25)
$\displaystyle {\cal P}(e = \epsilon_{i} \; and \; \epsilon_{j})$	$\textstyle =$	$\displaystyle {\cal P}(\epsilon_{i}) \cdot{\cal P}(\epsilon_{j})$	(1.26)

Thus the predictive calculation of probabilities reduces to the counting of the possible elementary events that make up the event in question.

EXAMPLE: The result $\epsilon_{6} \equiv \{no. of points =6\}$ is one among mutually exclusive elementary events with equal a priory probabilities (). The compound event $e \equiv \{ no. of points \leq 3\}$ consists of the elementary events $\epsilon_{i} =\{1, 2 or 3 \}$ ; its probability is thus ${\cal P}(1\vee 2\vee 3)= 1/6 + 1/6 + 1/6 = 1/2$ .

How might this apply to statistical mechanics? - Let us assume that we have

equivalent mechanical systems with possible states $s_{1}, s_{2}, \dots s_{K}$ . A relevant question is then: what is the probability of a situation in which

$\begin{displaymath} e \equiv \{ k_{1} systems are in state s_{1}, \; k_{2} systems in state s_{2}, etc. \} \end{displaymath}$

(1.27)

EXAMPLE: dice are thrown (or one die times!). What is the probability that dice each have numbers of points $1, 2, \dots 6$ ? What, in contrast, is the probability that all dice show a ``one''?

The same example, but with more obviously physical content:
Let gas atoms be contained in a volume , which we imagine to be divided into equal partial volumes. What is the probability that at any given time we find $k_{i}=10$ particles in each subvolume? And how probable is the particle distribution )? (Answer: see below under the heading ``multinomial distribution''.)

We can generally assume that both the number

of systems and the number

of accessible states are very large - in the so-called ``thermodynamic limit'' they are actually taken to approach infinity. This gives rise to certain mathematical simplifications.

Before advancing into the field of physical applications we will review the fundamental concepts and truths of statistics and probability theory, focussing on events that take place in number space, either $\cal R$ (real numbers) or $\cal N$ (natural numbers).

DISTRIBUTION FUNCTION
Let be a real random variate in the region . The distribution function

$\begin{displaymath} P(x_{0}) \equiv {\cal P} \{x < x_{0}\} \end{displaymath}$

(1.28)

is defined as the probability that some

is smaller than the given value $x_{0}$ . The function

is monotonically increasing and has

and

. The distribution function is dimensionless: $\left[ P(x) \right] = 1$ .

The most simple example is the equidistribution for which

$\begin{displaymath} P(x_{0})= \frac{x_{0}-a}{b-a} \end{displaymath}$

(1.29)

Another important example, with $a=- \infty$ , $b=\infty$ , is the normal distribution

$\begin{displaymath} P(x_{0})= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{x_{0}} dx e^{-x^{2}/2} \end{displaymath}$

(1.30)

and its generalization, the Gaussian distribution

$\begin{displaymath} P(x_{0})= \frac{1}{\sqrt{2\pi \sigma^{2}}} \int_{-\infty}^{x_{0}} dx e^{-(x- \langle x \rangle)^{2}/2 \sigma^{2}} \end{displaymath}$

(1.31)

where the parameters $\langle x \rangle$ and $\sigma^{2}$ define an ensemble of functions.

DISTRIBUTION DENSITY
The distribution or probability density

is defined by

$\begin{displaymath} p(x_{0}) dx \equiv {\cal P} \{ x \in [x_{0}, x_{0}+dx] \} \equiv dP(x_{0}) \end{displaymath}$

(1.32)

In other words,

is just the differential quotient of the distribution function:

$\begin{displaymath} p(x)= \frac{dP(x)}{dx} , \;\;\; {\rm i.e.} \;\; P(x_{0})= \int_{a}^{x_{0}} p(x) dx \end{displaymath}$

(1.33)

has a dimension; it is the reciprocal of the dimension of its argument

$\begin{displaymath} \left[ p(x) \right] = \frac{1}{[x]} \end{displaymath}$

(1.34)

For the equidistribution we have

$\begin{displaymath} p(x)= 1/(b-a), \end{displaymath}$

(1.35)

and for the normal distribution

$\begin{displaymath} p(x)= \frac{1}{\sqrt{2\pi}} e^{-x^{2}/2}. \end{displaymath}$

(1.36)

If is limited to discrete values $x_{\alpha}$ with a step $\Delta x_{\alpha} \equiv x_{\alpha+1}-x_{\alpha}$ one often writes

$\begin{displaymath} p_{\alpha} \equiv p(x_{\alpha}) \Delta x_{\alpha} \end{displaymath}$

(1.37)

for the probability of the event $x=x_{\alpha}$ . This $p_{\alpha}$ is by definition dimensionless, although it is related to the distribution density

for continuous arguments. The definition 1.37 includes the special case that

is restricted to integer values

; in that case $\Delta x_{\alpha} = 1$ .

MOMENTS OF A DENSITY
By this we denote the quantities

$\begin{displaymath} \langle x^{n} \rangle \equiv \int_{a}^{b} x^{n} p(x) dx \... ...; case, \;\;\;} \equiv \sum_{\alpha} p_{\alpha} x_{\alpha}^{n} \end{displaymath}$

(1.38)

The first moment $\langle x \rangle$ is also called the expectation value or mean value of the distribution density

, and the second moment $\langle x^{2} \rangle$ is related to the variance and the standard deviation: variance $\sigma^{2} = \langle x^{2} \rangle - \langle x \rangle ^{2}$ (standard deviation $\sigma$ = square root of variance).

EXAMPLES:
.) For an equidistribution $\epsilon [0, 1]$ we have $\langle x \rangle = 1/2$ , $\langle x^{2} \rangle =1/3$ and $\sigma^{2}= 1/12$ .
.) For the normal distribution we find $\langle x \rangle =0$ and $\langle x^{2} \rangle =\sigma^{2}=1$ .

SOME IMPORTANT DISTRIBUTIONS

$\bullet$ Equidistribution: Its great significance stems from the fact that this distribution is central both to statistical mechanics and to practical numerics. In the theory of statistical-mechanical systems, one of the fundamental assumptions is that all states of a system that have the same energy are equally probable (axiom of equal a priori probability). And in numerical computing the generation of homogeneously distributed pseudo-random numbers is relatively easy; to obtain differently distributed random variates one usually ``processes'' such primary equidistributed numbers.

$\bullet$ Gauss distribution: This distribution pops up everywhere in the quantifying sciences. The reason for its ubiquity is the ``central value theorem'': Every random variate that can be expressed as a sum of arbitrarily distributed random variates will in the limit of many summation terms be Gauss distributed. For example, when we have a complex measuring procedure in which a number of individual errors (or uncertainties) add up to a total error, then this error will be nearly Gauss distributed, regardless of how the individual contributions may be distributed. In addition, several other physically relevant distributions, such as the binomial and multinomial densities (see below), approach the Gauss distribution under certain - quite common - circumstances.

**Figure 1.9:** Equidistribution and normal distribution functions and densities
$\begin{figure}\includegraphics[width=180pt]{fig/f1glv.ps} \includegraphics[width=180pt]{fig/f1nvd.ps} \end{figure}$

$\bullet$ Binomial distribution: This discrete distribution describes the probability that in

independent trials an event that has a single trial probability

will occur exactly

times:

$\displaystyle p_{k}^{n}$	$\textstyle \equiv$	$\displaystyle {\cal P} \{ k \;\; {\rm times\; e \; in \; n \; trials}\}$
	$\textstyle =$	$\displaystyle {n \choose k} p^{k} (1-p)^{n-k}$	(1.39)

For the first two moments of the binomial distribution we have $\langle k \rangle =np$ (not necessarily integer) and $\sigma^{2}= np(1-p)$ (i.e. $\langle k^{2} \rangle - \langle k \rangle ^{2}$ ).

**Figure 1.10:** Binomial distribution density $p_{k}^{n}$
$\begin{figure}\includegraphics[width=300pt]{fig/f1bvd.ps} \end{figure}$

APPLICATION: Fluctuation processes in statistical systems are often described in terms of of the binomial distribution. For example, consider a particle freely roaming a volume . The probability to find it at some given time in a certain partial volume $V_{1}$ is $p(V_{1})=V_{1}/V$ . Considering now independent particles in , the probability of finding just $N_{1}$ of them in $V_{1}$ is given by

$\begin{displaymath} p_{N_{1}}^{N} = {N \choose N_{1}} (V_{1}/V)^{N_{1}} (1-V_{1}/V)^{N-N_{1}} \end{displaymath}$

(1.40)

The average number of particles in $V_{1}$ and its standard deviation are

$\begin{displaymath} \langle N_{1} \rangle = N V_{1}/V \;\; {\rm and} \; \; \sigm... ...\sqrt{(\Delta N_{1})^{2}} = \sqrt{N (V_{1} / V) (1-V_{1}/V)}. \end{displaymath}$

(1.41)

Note that for $Np \approx 1$ we have for the variance $\sigma^{2} \approx \sigma \approx \langle N_{1} \rangle \approx 1$ , meaning that the population fluctuations in $V_{1}$ are then of the same order of magnitude (namely, ) as the mean number of particles itself.

For large

such that

the binomial distribution approaches a Gauss distribution with mean

and variance $\sigma^{2}=npq$ (theorem of Moivre-Laplace):

$\begin{displaymath} p_{k}^{n} \Rightarrow p_{G}(k)=\frac{1}{\sqrt{2 \pi npq}} \exp{[-(k-np)^{2}/2npq]} \end{displaymath}$

(1.42)

with $q \equiv 1-p$ .

If $n \rightarrow \infty$ and $p \rightarrow 0$ such that their product $np \equiv \lambda$ remains finite, the density 1.39 approaches

$\begin{displaymath} p_{n}(k) = \frac{\lambda^{k}}{k!} e^{-\lambda} \end{displaymath}$

(1.43)

which goes by the name of Poisson distribution.

An important element in the success story of statistical mechanics is the fact that with increasing the sharpness of the distribution 1.39 or 1.42 becomes very large. The relative width of the maximum, i. e. $\sigma/\langle k \rangle$ , decreases as $1/\sqrt{n}$ . For $n=10^{4}$ the width of the peak is no more than $1 \%$ of $\langle k \rangle$ , and for ``molar'' orders of particle numbers $n \approx 10^{24}$ the relative width $\sigma/\langle k \rangle$ is already $\approx 10^{-12}$ . Thus the density approaches a ``delta distribution''. This, however, renders the calculation of averages particularly simple:

$\begin{displaymath} \langle f(k) \rangle = \sum_{k} p_{k}^{n}f(k) \rightarrow f(\langle k \rangle ) \end{displaymath}$

(1.44)

$\begin{displaymath} \langle f(k) \rangle \approx \int dk \delta (k-\langle k \rangle) f(k) = f(\langle k \rangle ) \end{displaymath}$

(1.45)

$\bullet$ Multinomial distribution: This is a generalization of the binomial distribution to more than 2 possible results of a single trial. Let $e_{1},e_{2},\dots ,e_{K}$ be the (mutally exclusive) possible results of an experiment; their probabilities in a single trial are $p_{1},p_{2},\dots, p_{K}$ , with $\sum_{i}^{K}p_{i}=1$ . Now do the experiment

times; then

$\begin{displaymath} p_{n}(k_{1},k_{2},\dots, k_{K}) = \frac{n!}{k_{1}! k_{2}! \d... ...s p_{K}^{k_{K}} \;\;\;(with \;\;k_{1}+k_{2}+ \dots +k_{K} = n) \end{displaymath}$

(1.46)

is the probability to have the event $e_{1}$ just $k_{1}$ times, $e_{2}$ accordingly $k_{2}$ times, etc.

We get an idea of the significance of this distribution in statistical physics if we interpret the the possible events as ``states'' that may be taken on by the particles of a system (or, in another context, by the systems in an ensemble of many-particle systems). The above formula then tells us the probability to find $k_{1}$ among the particles in state $e_{1}$ , etc.

EXAMPLE: A die is cast times. The probability to find each number of points just times is

$\begin{displaymath} p_{60}(10,10,10,10,10,10) = \frac{60!}{(10!)^{6}} \left( \frac{1}{6}\right)^{60} = 7.457 \cdot 10^{-5} . \end{displaymath}$

(1.47)

To compare, the probabilities of two other cases: $p_{60}(10,$ $= 6.778 \cdot 10^{-5}$ , $p_{60}(15,15,15,5,5,5)$ $= 4.406 \cdot 10^{-8}$ . Finally, for the quite improbable case we have $p_{60}=2.046 \cdot 10^{-47}$

Due to its large number of variables () we cannot give a graph of the multinomial distribution. However, it is easy to derive the following two important properties:

Approach to a multivariate Gauss distribution: just as the binomial distribution approaches, for large , a Gauss distribution, the multinomial density approaches an appropriately generalized - ``multivariate'' - Gauss distribution.

Increasing sharpness: if and $k_{1}\dots k_{K}$ become very large (multiparticle systems; or ensembles of $n \rightarrow \infty$ elements), the function $p_{n}(k_{1},k_{2},\dots, k_{K})$ $\equiv p_{n} (\vec{k})$ has an extremely sharp maximum for a certain partitioning $\vec{k}^{*}\equiv \{ k_{1}^{*}, k_{2}^{*}, \dots, k_{K}^{*}\}$ , namely $\{ k_{i}^{*}$ $=n p_{i}; i=1, \dots, K \}$ . This particular partitioning of the particles to the various possible states is then ``almost always'' realized, and all other allotments (or distributions) occur very rarely and may safely be neglected.

This is the basis of the method of the most probable distribution which is used with great success in several areas of statistical physics.[2.2]

STIRLING'S FORMULA
For large values of the evaluation of the factorial is difficult. A handy approximation is Stirling's formula

$\begin{displaymath} m! \approx \sqrt{2 \pi m} (m/e)^{m} \end{displaymath}$

(1.48)

EXAMPLE: (Near most pocket calculators' limit): $69! = 1.7112 \cdot 10^{98}$ ; $\sqrt{2 \pi \cdot 69} (69/e)^{69} = 1.7092 \cdot 10^{98}$ .

The same name Stirling's formula is often used for the logarithm of the factorial:

$\begin{displaymath} \ln m! \approx m (\ln m -1) + \ln \sqrt{2 \pi m} \approx m (\ln m -1) \end{displaymath}$

(1.49)

(The term $\ln \sqrt{2 \pi m}$ may usually be neglected in comparison to $m (\ln m -1)$ .)

EXAMPLE 1: $\ln 69! = 226.1905$ ; $69(\ln 69 - 1) + \ln \sqrt{2 \pi \cdot 69} = 223.1533 + 3.0360 = 226.1893$ .

EXAMPLE 2: The die is cast again, but now there are trials. When asked to produce most pocket calculators will cancel their cooperation. So we apply Stirling's approximation:
The probability of throwing each number of points just times is

$\begin{displaymath} p_{120}(20,\dots,20) = \frac{120!}{(20!)^{6}} \left( \frac{1... ...ht)^{20} \frac{1}{20!} \right]^{6} = 1.350 \cdot 10^{-5} , \end{displaymath}$

(1.50)

and the probability of the partitioning is $1.285 \cdot 10^{-5}$ .

STATISTICAL (IN)DEPENDENCE
Two random variates $x_{1}, x_{2}$ are statistically mutually independent (uncorrelated) if the distribution density of the compound probability (i.e. the probability for the joint occurence of $x_{1}$ and $x_{2}$ ) equals the product of the individual densities:

$\begin{displaymath} p(x_{1},x_{2}) = p(x_{1}) p(x_{2}) . \end{displaymath}$

(1.51)

EXAMPLE: In a fluid or gas the distribution density for a single component of the particle velocity is given by (Maxwell-Boltzmann)

$\begin{displaymath} p(v_{\alpha}) = \sqrt{\frac{m}{2 \pi kT}} \exp\{-\frac{m v_{\alpha}^{2}}{2 k T} \}, \;\; \alpha = x,y,z \end{displaymath}$

(1.52)

The degrees of freedom $\alpha = x,y,z$ are statistically independent; therefore the compound probability is given by

$\begin{displaymath} p(\vec{v}) \equiv p(v_{x},v_{y},v_{z}) = p(v_{x}) p(v_{y}) p... ...{3/2} \exp\{-\frac{m}{2 kT} (v_{x}^{2}+v_{y}^{2}+v_{z}^{2}) \} \end{displaymath}$

(1.53)

By conditional distribution density we denote the quantity

$\begin{displaymath} p(x_{2} \vert x_{1}) \equiv \frac{p(x_{1},x_{2})}{p(x_{1})} \end{displaymath}$

(1.54)

(For uncorrelated $x_{1}, x_{2}$ we have $p(x_{2} \vert x_{1}) =$ $p(x_{2})$ ).

The density of a marginal distribution describes the density of one of the variables regardless of the specific value of the other one, meaning that we integrate the joint density over all possible values of the second variable:

$\begin{displaymath} p(x_{2}) \equiv \int_{a_{1}}^{b_{1}} p(x_{1},x_{2}) dx_{1} \end{displaymath}$

(1.55)

TRANSFORMATION OF DISTRIBUTION DENSITIES
From 1.32 we can immediately conclude how the density

will transform if we substitute the variable

. Given some bijective mapping $y=f(x); \; x=f^{-1}(y)$ the conservation of probability requires

$\begin{displaymath} \vert dP(y) \vert = \vert dP(x)\vert \end{displaymath}$

(1.56)

(The absolute value appears because we have not required

to be an increasing function.) This leads to

$\begin{displaymath} \vert p(y) dy \vert = \vert p(x) dx\vert \end{displaymath}$

(1.57)

$\begin{displaymath} p(y)= p(x) \left\vert \frac{dx}{dy}\right\vert = p[f^{-1}(y)] \left\vert \frac{df^{-1}(y)}{dy} \right\vert \end{displaymath}$

(1.58)

Incidentally, this relation is true for any kind of density, such as mass or spectral densities, and not only for distribution densities.

EXAMPLE 1: A particle of mass moving in one dimension is assumed to have any velocity in the range $\pm v_{0}$ with equal probability; so we have $p(v)=1/2v_{0}$ . The distribution density for the kinetic energy is then given by (see Figure 1.11)

$\begin{displaymath} p(E) = 2 p(v) \left\vert \frac{dv}{dE} \right\vert = \frac{1}{2 \sqrt{E_{0}}} \frac{1}{\sqrt{E}} \end{displaymath}$

(1.59)

in the limits $0 .. E_{0}$ , where $E_{0}=mv_{0}^{2}/2$ . (The factor in front of comes from the ambiguity of the mapping $v \leftrightarrow E$ ).

**Figure 1.11:** Transformation of the distribution density (see Example 1)
$\begin{figure}\includegraphics[width=180pt]{fig/f1tr1.ps} \includegraphics[width=180pt]{fig/f1tr2.ps} \end{figure}$

EXAMPLE 2: An object is found with equal probability at any point along a circular periphery; so we have $p(\phi) = 1/2 \pi$ for $\phi \epsilon [ 0, 2 \pi ]$ . Introducing cartesian coordinates $x = r \cos \phi$ , $y = r \sin \phi$ we find for the distribution density of the coordinate , with $x \epsilon [\pm r]$ , that

$\begin{displaymath} p(x) = p(\phi) \left\vert \frac{d \phi}{dx} \right\vert = \frac{1}{\pi} \frac{1}{\sqrt{r^{2}-x^{2}}} \end{displaymath}$

(1.60)

(see Figure 1.12)

Problems equivalent to these examples:
a) A homogeneously blackened glass cylinder - or a semitransparent drinking straw - held sideways against a light source: absorption as a function of the distance from the axis?
b) Distribution of the -velocity of a particle that can move randomly in two dimension, keeping its kinetic energy constant.

$\Rightarrow \;$ Simulation 1.6: Stadium Billiard. Distribution of the velocity component $v_{x}$ . [Code: Stadium]

c) Distribution of the velocity of any of two particles arbitrarily moving in one dimension, keeping only the sum of their kinetic energies constant.

**Figure 1.12:** Transformation of the distribution density (see Example 2)
$\begin{figure}\includegraphics[width=180pt]{fig/f1tr3.ps} \includegraphics[width=180pt]{fig/f1tr4.ps} \end{figure}$

For the joint probability density of several variables the transformation formula is a direct generalization of 1.58, viz.

$\begin{displaymath} p(\vec{y})= p(\vec{x}) \left\vert \frac{d\vec{x}}{d\vec{y}}\right\vert \end{displaymath}$

(1.61)

Here we write $\left\vert d\vec{x}/d\vec{y}\right\vert$ for the functional determinant (or Jacobian) of the mapping $\vec{x} = \vec{x}(\vec{y})$ ,

$\begin{displaymath} \left\vert d(x_{1},x_{2}, \dots)/d(y_{1},y_{2},\dots)\right\... ...}/dy_{2} & \dots \\ \vdots & & \ddots \end{array}\right\vert \end{displaymath}$

(1.62)

EXAMPLE 3: Again, let $\vec{v} \equiv \{v_{x},v_{y},v_{z}\}$ , and $p(\vec{v})$ as in equ. 1.53. Now we write $\vec{w}$ $\equiv \{ v, \phi, \theta \}$ , with

$\begin{displaymath} v_{x} = v \sin \theta \cos \phi, \;\; v_{y} = v \sin \theta \sin \phi, \;\; v_{z} = v \cos \theta . \end{displaymath}$

(1.63)

The Jacobian of the mapping $\vec{v} = \vec{v}(\vec{w})$ is

$\begin{displaymath} \left\vert d(v_{x},v_{y},v_{z})/d(v,\phi,\theta)\right\vert=... ... & -v \sin \theta \end{array}\right\vert = - v^{2} \sin \theta \end{displaymath}$

(1.64)

Therefore we have for the density of the modulus of the particle velocity

$\displaystyle p(v)$	$\textstyle \equiv$	$\displaystyle \int_{0}^{2 \pi} d \phi \int_{- \pi}^{\pi} d \theta v^{2} \sin \theta \left( \frac{m}{2 \pi kT} \right)^{3/2} \exp\{- \frac{m}{2 kT} v^{2} \}$	(1.65)
	$\textstyle =$	$\displaystyle 4 \pi \left( \frac{m}{2 \pi kT} \right)^{3/2} v^{2} \exp\{- \frac{m}{2 kT} v^{2} \}$	(1.66)

Next: 1.4 Problems for Chapter Up: 1. Why is water Previous: 1.2 Model systems

Franz Vesely
2005-01-25