< >
 Ch. 2 Sec. 3

Actually an exact method, but a multi-step approach.

Task: Solve $A \cdot x = b$. This may be interpreted as a minimization problem:

Let

f(x) \equiv \frac{\textstyle 1}{\textstyle 2} \, \left| A \cdot x - b \right|^{2} \, ,

Then the $N$-vector $x$ which minimizes $f(x)$ (with the minimum value $f=0$) is the desired solution.

Minimization of a quadratic function of $N$ variables:

 Conjugate gradients: $\vec{g}_{0}=-\vec{\nabla} f(P_{0})$... direction of steepest descent at $P_{0}$, etc.; $\vec{h}_{1}$ ... direction of the gradient conjugate to $\vec{g}_{0}$.

Warm-up: Cauchy's Steepest Descent Method
Assume $f$ to depend on two variables $\vec{x}=(x_{1},x_{2})$ only. The lines of equal elevation of a quadratic function are ellipses that may be very elongated (see the figure).

• Start from $P_{0}$ with position vector $\vec{x}_{0}$ and follow the local gradient $\vec{g}_{0} = - \nabla f(P_{0})$. This direction will not lead directly to the extremal point of $f$.
• Find the lowest point $P_{1}$ along this path. Determine once more the local gradient $\vec{g}_{1}= - \nabla f(P_{1})$ by construction it must be perpendicular to the previous direction $\vec{g}_{0}$.
• Iterate this procedure to arrive, after many mutually orthogonal bends, at the bottom of the channel (or top of the hill).

Now for the serious work: Conjugate Gradients
From point $P_{1}$ take the direction $\vec{h}_{1}$ instead of $\vec{g}_{1}$.
BUT: How to find $\vec{h}_{1}$?

Require that along $\vec{h}_{1}$ the change of the gradient of $f$ has no component parallel to $\vec{g}_{0}$. (In contrast: along $\vec{g}_{1}$, the gradient of $f$ has initially no $\vec{g}_{0}$ -component). As a consequence, a gradient in the direction of $\vec{g}_{0}$ will not develop immediately (on quadratic surfaces never!).

$\Longrightarrow$ This requirement leads to the following prescription.

 Conjugate gradient technique: Let $P_{0}\,$ (with the position vector $\vec{x}_{0}$) be the starting point of the search; the local gradient at $P_{0}\,$ is $\vec{g}_{0} \equiv - \nabla f(\vec{x}_{0}) = - A^{T} \cdot \left[ A \cdot \vec{x}_{0} - \vec{b} \right]$ The next "low point" $P_{1}$ is then situated at $\vec{x}_{1} = \vec{x}_{0} + \lambda _{1} \, \vec{g}_{0}$ with $\lambda_{1} = \frac{\textstyle \left| \vec{g}_{0} \right|^{2}}{\textstyle \left| A \cdot \vec{g}_{0} \right|^{2}} \, .$ From $P_{1}$ we proceed not along the local gradient $\vec{g}_{1} = - A^{T} \cdot \left[ A \cdot \vec{x}_{1} - \vec{b} \right]$ but along the gradient conjugate to $\vec{g}_{0}$, i.e. $\vec{h}_{1} = \vec{g}_{1} - \frac{\textstyle ( A \cdot \vec{g}_{1}) \cdot ( A \cdot \vec{g}_{0})}{\textstyle \left| A \cdot \vec{g}_{0} \right|^{2}} \, .$ The low point along this path is at $\vec{x}_{2} = \vec{x}_{1} + \lambda_{2} \, \vec{h}_{1} \, ,$ with $\lambda_{2} = \frac{\textstyle \left| \vec{g}_{1} \cdot \vec{h}_{1} \right|}{\textstyle \left| A \cdot \vec{h}_{1} \right|^{2}}$

For dimension $N=2$ this is all there is; $\vec{x} = \vec{x}_{2}$ is already the solution vector. For systems of higher dimension one has to go on from $\vec{x}_{2}$ in the direction

\vec{h}_{2} = \vec{g}_{2} - \frac{\textstyle (A \cdot \vec{g}_{2}) \cdot (A \cdot \vec{h}_{1})}{\textstyle \left| A \cdot \vec{h}_{1} \right|^{2}} \, .

until the next "low point" is reached at

\vec{x}_{3} = \vec{x}_{2} + \lambda_{3} \, \vec{h}_{2} \, ,

with

\lambda_{3} = \frac{\textstyle \left| \vec{g}_{2} \cdot \vec{h}_{2} \right|}{\textstyle \left| A \cdot \vec{h}_{2} \right|^{2}} \, .

Next we determine $\vec{x}_{4}$ and so forth, until we arrive at the solution $\vec{x} = \vec{x}_{N}$.

Advantage of CG: No matrix inversion!

But: Several multiplications $A \cdot \vec{x}$; therefore most efficient for sparse matrices.

EXAMPLE: To see the mechanics of the method, assume

A=\left( \begin{array}{cc}3&1\\ \\2&4\end{array} \right) \, ; \; \vec{b}=\left( \begin{array}{r} 3 \\ \\2 \end{array} \right) \, ; \;\; and \; \; \vec{x}_{0} = \left( \begin{array}{r} 1.2 \\ \\0.2 \end{array} \right)

The gradient vector at $\vec{x}_{0}$ is

\vec{g}_{0} = - A^{T} \cdot \left[ A \cdot \vec{x}_{0} - \vec{b}\right] = - \left( \begin{array}{r} 4.8 \\ \\5.6 \end{array} \right)

and

\lambda_{1} = \frac{\textstyle \left| \vec{g}_{0} \right|^{2}}{\textstyle \left| A \cdot \vec{g}_{0} \right|^{2}} = 0.038

such that

\vec{x}_{1}=\vec{x}_{0}+\lambda_{1}\vec{g}_{0}=\left( \begin{array}{r} 1.017 \\ \\-0.014 \end{array} \right)

Similarly,

\vec{g}_{1}=\left( \begin{array}{r} -0.063 \\ \\0.054 \end{array} \right) \, , \;\; \vec{h}_{1}=\left( \begin{array}{r} -0.0635 \\ \\0.0532 \end{array} \right) \, , \;\; \lambda_{2}=0.262 \, ,

and thus

\vec{x}_{2}= \left( \begin{array}{r} 1.000 \\ \\0.000 \end{array} \right)

which is the desired solution vector in this 2-dimensional case. If $A$ were a $N \times N$ matrix, $N$ such steps would be necessary.

vesely 2005-10-10

 < >