Implemented 12/12/2021. A number of deprecated warnings appeared.

DeprecationWarning: scipy.arange is deprecated and will be removed in SciPy 2.0.0, use numpy.arange instead. Line [3] [17].

DeprecationWarning: scipy.vander is deprecated and will be removed in SciPy 2.0.0, use numpy.vander instead. Line [5].

DeprecationWarning: scipy.linspace is deprecated and will be removed in SciPy 2.0.0, use numpy.linspace instead. Lines [8], [9], [18], [19].

DeprecationWarning: scipy.array is deprecated and will be removed in SciPy 2.0.0, use numpy.array instead. Line [10] [11] [12] [19], [29], [30], [31], [38], [40], [43], [49], [50], [51], [60], [65].

DeprecationWarning: scipy.sqrt, scipy.exp, scipy.log are deprecated and will be removed in SciPy 2.0.0, use numpy. [11]

Also need to update hstack and vstack. [13, [14], [52].

DeprecationWarning: scipy.ones is deprecated and will be removed in SciPy 2.0.0, use numpy.ones instead. [27], [45], [47], [48], [49].

DeprecationWarning: scipy.diag. [32], [47], [49]

DeprecationWarning: scipy.eye. [61]

DeprecationWarning: scipy.cos [39].

DeprecationWarning: scipy.rand and scipy.floor are deprecated. [44] Use np.random.rand.

DeprecationWarning: scipy.outer and scipy.dot. [62], [64]

Up through line [69]. More after that.

Example 2.1.1

We create two vectors for data about the population of China. The first has the years of census data, the other has the numbers of millions of people.

It's convenient to measure time in years since 1980.

Now we have four data points $(t_1,y_1),\dots,(t_4,y_4)$, so $n=4$ and we seek an interpolating cubic polynomial. We construct the associated Vandermonde matrix:

To solve for the vector of polynomial coefficients, we use solve (from scipy.linalg):

The algorithms used by the backslash operator are the main topic of this chapter. For now, observe that the coefficients of the cubic polynomial vary over several orders of magnitude, which is typical in this context. By our definitions, these coefficients are given in descending order of power in $t$.

We can use the resulting polynomial to estimate the population of China in 2005:

The official figure is 1297.8, so our result is not bad.

We can visualize the interpolation process. First, we plot the data as points. Then we add a plot of the interpolant, taking care to shift the $t$ variable back to actual years.

Let's redo it, this time continuing the curve outside of the original date range.

While the interpolation is plausible, the extrapolation to the future is highly questionable! As a rule, extrapolation more than a short distance beyond the original interval is not reliable.

Example 2.2.1

While Python (specifically NumPy) does have distinct representations for matrices and 2D arrays, use of the explicit matrix class is officially discouraged. We follow this advice here and use arrays to represent matrices and vectors.

A vector is created using square brackets and commas to enclose and separate its entries.

To construct a matrix, you nest the brackets to create a "vector of vectors". The inner vectors are the rows.

In the text we default to all vectors being equivalent to matrices with a single column. That isn't quite true in Python, because even an $n\times1$ array has two dimensions, unlike a vector.

You can also concatenate arrays with compatible dimensions, not just numbers, using hstack and vstack.

Transposing a matrix is done by appending .T to it. For matrices with complex values, we often want instead the hermitian, which is .conj().T. They are the same for a real matrix.

There are many convenient shorthand ways of building vectors and matrices other than entering all of their entries directly or in a loop. To get a vector with evenly spaced entries between two endpoints, you have two options.

The practical difference between these is whether you want to specify the step size in arange or the number of points in linspace.

Accessing an element is done by giving one (for a vector) or two index values in square brackets. In Python, indexing always starts with zero, not 1.

The indices can be ranges, in which case a slice or block of the matrix is accessed. You build these using a colon in the form start:stop. However, the last value of this range is stop-1, not stop.

If start or stop is omitted, the range extends to the first or last index.

Notice in the last case above that even when the slice is in the shape of a column vector, the result is just a vector with one dimension and neither row nor column shape.

There are more variations on the colon ranges. A negative value means to count from the end rather than the beginning. And a colon by itself means to include everything from the relevant dimension.

Finally, start:stop:step means to step size or stride other than one. You can mix this with the other variations.

The matrix and vector senses of addition, subtraction, and scalar multiplication and division are all handled by the usual symbols. Two matrices of the same size (what Python calls shape) are operated on elementwise.

If one operand has a smaller number of dimensions than the other, Python tries to broadcast it in the "missing" dimension(s), and the operation proceeds if the resulting shapes are identical.

Matrix–matrix and matrix–vector products are computed using @ or dot.

$AB$ is undefined for these matrix sizes.

A square matrix raised to an integer power is the same as repeated matrix multiplication.

The multiplication operator * is reserved for elementwise multiplication. Both operands have to be the same size, after any potential broadcasts.

To raise to a power elementwise, use a double star. This will broadcast as well.

Because elementwise and matrix multiplication are different, so are the following.

Most of the mathematical functions, such as cos, sin, log, exp and sqrt, expecting scalars as operands will be broadcast to arrays.

Example 2.3.2

For a square matrix $A$, the command solve(A,B) is mathematically equivalent to $A^{-1}b$.

One way to check the answer is to compute a quantity known as the residual. It is (hopefully) close to machine precision, scaled by the size of the entries of the data. Notice that the matrix–vector multiplication here is performed by dot.

If the matrix $A$ is singular, you may get an error.

Detecting singularity is a lot like checking whether two floating point numbers are exactly equal: because of roundoff, it could be missed. We're headed toward a more robust way to fully describe the situation.

Example 2.3.3

It's easy to get just the lower triangular part of any matrix using the tril command.

We'll set up and solve a linear system with this matrix.

It's not clear what the error in this answer is. However, we can always check the residual.

Next we'll engineer a problem to which we know the exact answer.

Everything seems OK here. But another example, with a different value for $\beta$, is more troubling.

It's not so good to get four digits of accuracy after starting with sixteen! But the source of the error is not hard to track down. Solving for $x_1$ performs $(\alpha-\beta)+\beta$ in the first row. Since $|\alpha|$ is so much smaller than $|\beta|$, this a recipe for losing digits to subtractive cancellation.

Example 2.4.1

We create a 4-by-4 linear system with the matrix

and with the right-hand side

We define an augmented matrix by tacking $b$ on the end as a new column.

The goal is to introduce zeros into the lower triangle of this matrix. By using only elementary row operations, we ensure that the matrix $S$ always represents a linear system that is equivalent to the original. We proceed from left to right and top to bottom. The first step is to put a zero in the (2,1) location using a multiple of row 1:

We repeat the process for the (3,1) and (4,1) entries.

The first column has the zero structure we want. To avoid interfering with that, we no longer add multiples of row 1 to anything. Instead, to handle column 2, we use multiples of row 2. We'll also exploit the highly repetitive nature of the operations to write them as a loop.

We finish out the triangularization with a zero in the (4,3) place. It's a little silly to use a loop for just one iteration, but the point is to establish a pattern.

Recall that $S$ is an augmented matrix: it represents the system $Ux=z$, where

The solutions to this system are identical to those of the original system, but this one can be solved by backward substitution.

Example 2.4.2

We revisit the previous example using algebra to express the row operations on $A$.

We use the identity and its columns heavily.

The first step is to put a zero in the (2,1) location using a multiple of row 1:

We repeat the process for the (3,1) and (4,1) entries.

And so on, following the pattern as before.

Example 2.4.3

It's best to compare two floating-point quantities by taking their difference.

(Usually we can expect "zero" only up to machine precision. However, all the exact numbers in this example are also floating-point numbers.)

To solve a linear system, we no longer need the matrix $A$.

Example 2.5.3

Each of the loops implies a summation of flops. The total flop count for this algorithm is [ \sum{i=1}^n \sum{j=1}^n 2 = \sum_{i=1}^n 2n = 2n^2. ] Since the matrix $A$ has $n^2$ elements, all of which have to be involved in the product, it seems unlikely that we could get a flop count that is smaller than $O(n^2)$.

Let's run an experiment with the built-in matrix-vector multiplication. We assume that flops dominate the computation time and thus measure elapsed time.

The reason for doing multiple repetitions at each value of $n$ above is to avoid having times so short that the resolution of the timer is a factor.

Example 2.5.4

Let's repeat the experiment of the previous example for more, and larger, values of $n$.

Plotting the time as a function of $n$ on log-log scales is equivalent to plotting the logs of the variables, but is formatted more neatly.

You can see that while the full story is complicated, the graph is trending to a straight line of positive slope. For comparison, we can plot a line that represents $O(n^2)$ growth exactly. (All such lines have slope equal to 2.)

Example 2.5.5

We'll test the conclusion of $O(n^3)$ flops experimentally, using the built-in lu function instead of the purely instructive lufact.

We plot the timings on a log-log graph and compare it to $O(n^3)$. The result could vary significantly from machine to machine.

Example 2.6.1

Here is the previously solved system.

It has a perfectly good solution, obtainable through LU factorization.

If we swap the second and fourth equations, nothing essential is changed, and the built-in method still finds the solution.

However, LU factorization fails.

Example 2.6.2

Here is the system that "broke" LU factorization for us.

When we use the lu function with three outputs, we get the elements of the PLU factorization.

However, while our notation defines $A=P^TLU$, Python is using $A=PLU$. In order to use the factors to solve the linear system, we have to use $P^T$, which equals $P^{-1}$ for a permutation matrix.

If you have to solve many different linear systems for the same matrix, you can perform the computationally expensive factorization just once, and repeat only the much faster triangular solves for the different right-hand sides.

Example 2.7.1

The norm command computes vector norms.

Example 2.7.2

The default matrix norm is not the 2-norm. Instead you must provide the 2 explicitly.

You can get the 1-norm as well.

The 1-norm is equivalent to

Similarly, we can get the $\infty$-norm and check our formula for it.

Here we illustrate the geometric interpretation of the 2-norm. First, we will sample a lot of vectors on the unit circle in $\mathbb{R}^2$.

We can apply A to every column of x simply by using a matrix multiplication.

We superimpose the image of the unit circle with the circle whose radius is $\|A\|_2$, and display the plots side by side.

Example 2.8.1

The function cond to computes matrix condition numbers. By default, the 2-norm is used. As an example, the family of Hilbert matrices is famously badly conditioned. Here is the $7\times 7$ case.

Next we engineer a linear system problem to which we know the exact answer.

Now we perturb the data randomly with a vector of norm $10^{-12}$.

We solve the perturbed problem using built-in pivoted LU and see how the solution was changed.

Here is the relative error in the solution.

And here are upper bounds predicted using the condition number of the original matrix.

Even if we don't make any manual perturbations to the data, machine epsilon does when we solve the linear system numerically.

Because $\kappa\approx 10^8$, it's possible to lose 8 digits of accuracy in the process of passing from $A$ and $b$ to $x$. That's independent of the algorithm; it's inevitable once the data are expressed in double precision.

Larger Hilbert matrices are even more poorly conditioned.

Before we compute the solution, note that $\kappa$ exceeds 1/eps. In principle we therefore might end up with an answer that is completely wrong (i.e., a relative error greater than 100%).

We got an answer. But in fact the error does exceed 100%.

Example 2.9.1

Here is a matrix with both lower and upper bandwidth equal to one. Such a matrix is called tridiagonal.

We can extract the elements on any diagonal using the diag command. The "main" or central diagonal is numbered zero, above and to the right of that is positive, and below and to the left is negative.

We can also construct matrices by specifying a diagonal with the diag function.

Observe above that the lower and upper bandwidths of $A$ are preserved in the $L$ and $U$ results.

Example 2.9.2

We'll use a large banded matrix to observe the speedup possible in LU factorization. If we use an ordinary "dense" matrix, though, then there's no way to exploit a banded structure such as tridiagonality.

If instead we construct a proper "sparse" matrix, though, the speedup can be dramatic.

Example 2.9.3

We begin with a symmetric $A$.

Carrying out our usual elimination in the first column leads us to

But now let's note that if we transpose this result, we have the same first column as before! So we could apply again and then transpose back.

Using transpose identities, this is just

Now you can see how we proceed down and to the right, eliminating in a column and then symmetrically in the corresponding row.

Finally, we arrive at a diagonal matrix.

Example 2.9.4

A randomly chosen matrix is extremely unlikely to be symmetric. However, there is a simple way to symmetrize one.

Similarly, a random symmetric matrix is unlikely to be positive definite. The Cholesky algorithm always detects a non-PD matrix by quitting with an error.

It's not hard to manufacture an SPD matrix to try out the Cholesky factorization.