# import Pkg; Pkg.add("LaTeXStrings") # Add on first run
using LinearAlgebra
using Plots,LaTeXStrings
using Polynomials
include("FNC.jl")
Main.FNC
Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951-1980 average (source: NASA).
t = 1955:5:2000
y = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ];
scatter(t,y,label="data",
xlabel="year",ylabel="anomaly (degrees C)",leg=:bottomright)
A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.
t = @. (t-1950)/10;
V = [ t[i]^j for i=1:length(t), j=0:length(t)-1 ]
c = V\y
10-element Vector{Float64}: -14.114000003106533 76.3617381125826 -165.45597226105033 191.960566713249 -133.27347225566922 58.01557779282242 -15.962888893158295 2.6948063499477524 -0.2546666667385165 0.010311111114084734
p = Polynomial(c)
f = s -> p((s-1950)/10)
plot!(f,1955,2000,label="interpolant")
As you can see, the interpolant does represent the data, in a sense. However it's a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting.
Here are the 5-year temperature averages again.
year = 1955:5:2000
t = year .- 1955;
y = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ];
The standard best-fit line results from using a linear polynomial that meets the least squares criterion.
V = [ t.^0 t ] # Vandermonde-ish matrix
c = V\y
p = Polynomial(c)
f = s -> p(s-1955)
scatter(year,y,label="data",
xlabel="year",ylabel="anomaly (degrees C)",leg=:bottomright)
plot!(f,1955,2000,label="linear fit")
If we use a global cubic polynomial, the points are fit more closely.
V = [ t[i]^j for i=1:length(t), j=0:3 ] # Vandermonde-ish matrix
p = Polynomial( V\y )
f = s -> p(s-1955)
plot!(f,1955,2000,label="cubic fit")
If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.
Finding numerical approximations to $\pi$ has fascinated people for millenia. One famous formula is
$$ \frac{\pi^2}{6} = 1 + \frac{1}{2^2} + \frac{1}{3^2} + \cdots. $$Say $s_k$ is the sum of the first terms of the series above, and $p_k = \sqrt{6s_k}$. Here is a fancy way to compute these sequences in a compact code.
a = [1/k^2 for k=1:100]
s = cumsum(a) # cumulative summation
p = @. sqrt(6*s)
plot(1:100,p,m=(:o,2),leg=:none,xlabel=L"k",ylabel=L"p_k",title="Sequence convergence")
This graph suggests that $p_k\to \pi$ but doesn't give much information about the rate of convergence. Let $\epsilon_k=|\pi-p_k|$ be the sequence of errors. By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.
ep = @. abs(pi-p) # error sequence
scatter(1:100,ep,
leg=:none,xaxis=(:log10,L"k"),yaxis=(:log10,"error"),title="Convergence of errors")
This suggests a power-law relationship where $\epsilon_k\approx a k^b$, or $\log \epsilon_k \approx b (\log k) + \log a$.
k = 1:100
V = [ k.^0 log.(k) ] # fitting matrix
c = V \ log.(ep) # coefficients of linear fit
2-element Vector{Float64}: -0.18237524972830194 -0.9674103233127926
In terms of the parameters $a$ and $b$ used above, we have
@show (a,b) = exp(c[1]),c[2];
(a, b) = (exp(c[1]), c[2]) = (0.8332885904225771, -0.9674103233127926)
It's tempting to conjecture that $b\to -1$ asymptotically. Here is how the numerical fit compares to the original convergence curve.
plot!(k,a*k.^b,l=:dash)
Because the functions $\sin^2(t)$, $\cos^2(t)$, and $1$ are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.
t = range(0,stop=3,length=400)
A = [ sin.(t).^2 cos.((1+1e-7)*t).^2 t.^0 ]
kappa = cond(A)
1.8253225422971e7
Now we set up an artificial linear least squares problem with a known exact solution that actually makes the residual zero.
x = [1.,2,1]
b = A*x;
Using backslash to find the solution, we get a relative error that is about $\kappa$ times machine epsilon.
x_BS = A\b
@show observed_err = norm(x_BS-x)/norm(x);
@show max_err = kappa*eps();
observed_err = norm(x_BS - x) / norm(x) = 4.291303437118621e-11 max_err = kappa * eps() = 4.053030227651133e-9
If we formulate and solve via the normal equations, we get a much larger relative error. With $\kappa^2\approx 10^{14}$, we may not be left with more than about 2 accurate digits.
N = A'*A
x_NE = N\(A'*b)
@show observed_err = norm(x_NE-x)/norm(x)
@show digits = -log(10,observed_err)
observed_err = norm(x_NE - x) / norm(x) = 0.016675644280325516 digits = -(log(10, observed_err)) = 1.777917377682191
1.777917377682191
Julia provides access to both the thin and full forms of the QR factorization.
A = rand(1.:9.,6,4)
@show m,n = size(A);
(m, n) = size(A) = (6, 4)
Here is a standard call:
Q,R = qr(A);
Q
6×6 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}: -0.0836242 0.955638 0.059977 -0.248946 -0.111609 0.0416588 -0.167248 0.0567385 -0.0112245 0.408331 -0.565018 -0.694769 -0.418121 0.141846 -0.665818 0.502622 0.157327 0.290449 -0.334497 -0.234249 -0.233057 -0.494378 -0.661083 0.312223 -0.752618 -0.0924026 0.164053 -0.310138 0.42072 -0.353446 -0.334497 -0.00243165 0.686829 0.421982 -0.171786 0.456939
R
4×4 Matrix{Float64}: -11.9583 -9.03141 -11.2893 -6.68994 0.0 8.62749 2.90258 3.0809 0.0 0.0 4.70399 -1.3748 0.0 0.0 0.0 1.69197
If you look carefully, you see that we got a full $Q$ but a thin $R$. Moreover, the $Q$ above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form.
Matrix(Q)
6×4 Matrix{Float64}: -0.0836242 0.955638 0.059977 -0.248946 -0.167248 0.0567385 -0.0112245 0.408331 -0.418121 0.141846 -0.665818 0.502622 -0.334497 -0.234249 -0.233057 -0.494378 -0.752618 -0.0924026 0.164053 -0.310138 -0.334497 -0.00243165 0.686829 0.421982
We can test that $\mathbf{Q}$ is orthogonal.
QTQ = Q'*Q
norm(QTQ - I)
1.2121972803000621e-15
The thin $Q$ cannot be an orthogonal matrix, because it is not even square, but it is still ONC.
Matrix(Q)'*Matrix(Q) - I
4×4 Matrix{Float64}: -1.11022e-16 -1.80246e-16 5.18364e-17 -4.05087e-17 -1.80246e-16 6.66134e-16 2.62745e-17 -8.7609e-17 5.18364e-17 2.62745e-17 2.22045e-16 -2.31731e-16 -4.05087e-17 -8.7609e-17 -2.31731e-16 -3.33067e-16
We will use Householder reflections to produce a QR factorization of the matrix
A = rand(1.:9.,6,4)
m,n = size(A)
(6, 4)
Our first step is to introduce zeros below the diagonal in column 1. Define the vector
z = A[:,1];
Applying the Householder definitions gives us
v = z - norm(z)*[1;zeros(m-1)]
P = I - 2/(v'*v)*(v*v') # reflector
6×6 Matrix{Float64}: 0.479808 0.479808 0.059976 0.359856 0.419832 0.479808 0.479808 0.557441 -0.0553199 -0.33192 -0.38724 -0.442559 0.059976 -0.0553199 0.993085 -0.0414899 -0.0484049 -0.0553199 0.359856 -0.33192 -0.0414899 0.75106 -0.29043 -0.33192 0.419832 -0.38724 -0.0484049 -0.29043 0.661165 -0.38724 0.479808 -0.442559 -0.0553199 -0.33192 -0.38724 0.557441
(Julia automatically fills in an identity of the correct size for the I
above.) By design we can use the reflector to get the zero structure we seek:
P*z
6-element Vector{Float64}: 16.67333200053307 -1.7763568394002505e-15 -2.220446049250313e-16 -8.881784197001252e-16 -1.3322676295501878e-15 -1.7763568394002505e-15
Now we let
A = P*A
6×4 Matrix{Float64}: 16.6733 13.6745 8.81647 12.415 0.0 -3.00111 0.324543 3.85008 -2.22045e-16 0.999862 6.79057 3.60626 -8.88178e-16 2.99917 1.74341 2.63756 -1.33227e-15 -3.00097 4.53398 0.243822 -1.77636e-15 -0.00110618 -0.675457 -0.149917
We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. To put it another way, we can repeat the process we just did on the smaller submatrix
A[2:m,2:n]
5×3 Matrix{Float64}: -3.00111 0.324543 3.85008 0.999862 6.79057 3.60626 2.99917 1.74341 2.63756 -3.00097 4.53398 0.243822 -0.00110618 -0.675457 -0.149917
z = A[2:m,2];
v = z - norm(z)*[1;zeros(m-2)];
P = I - 2/(v'*v)*(v*v');
We now apply the reflector to the submatrix.
A[2:m,2:n] = P*A[2:m,2:n]
A
6×4 Matrix{Float64}: 16.6733 13.6745 8.81647 12.415 0.0 5.29218 -0.48395 -0.145457 -2.22045e-16 3.48636e-16 6.88804 4.08797 -8.88178e-16 9.47002e-16 2.03579 4.0825 -1.33227e-15 -6.26198e-16 4.24142 -1.20198 -1.77636e-15 -3.62763e-19 -0.675565 -0.15045
We need two more iterations of this process.
for j = 3:n
z = A[j:m,j]
v = z - norm(z)*[1;zeros(m-j)]
P = I - 2/(v'*v)*(v*v')
A[j:m,j:n] = P*A[j:m,j:n]
end
We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:
R = A
6×4 Matrix{Float64}: 16.6733 13.6745 8.81647 12.415 0.0 5.29218 -0.48395 -0.145457 -2.22045e-16 3.48636e-16 8.36873 3.76076 -8.88178e-16 9.47002e-16 -9.95674e-16 4.54999 -1.33227e-15 -6.26198e-16 -8.48914e-16 -2.22945e-16 -1.77636e-15 -3.62763e-19 6.80235e-17 -2.17005e-16