# import Pkg; Pkg.add("LaTeXStrings")     # Add on first run
using LinearAlgebra
using Plots,LaTeXStrings
using Polynomials
include("FNC.jl")
Main.FNC
Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951-1980 average (source: NASA).
t = 1955:5:2000
y = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
    0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ];
    
scatter(t,y,label="data",
    xlabel="year",ylabel="anomaly (degrees C)",leg=:bottomright)
A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.
t = @. (t-1950)/10; 
V = [ t[i]^j for i=1:length(t), j=0:length(t)-1 ]
c = V\y
10-element Vector{Float64}:
  -14.114000003106533
   76.3617381125826
 -165.45597226105033
  191.960566713249
 -133.27347225566922
   58.01557779282242
  -15.962888893158295
    2.6948063499477524
   -0.2546666667385165
    0.010311111114084734
p = Polynomial(c)
f = s -> p((s-1950)/10)
plot!(f,1955,2000,label="interpolant")
As you can see, the interpolant does represent the data, in a sense. However it's a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting.
Here are the 5-year temperature averages again.
year = 1955:5:2000
t = year .- 1955;
y = [ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
    0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ];
The standard best-fit line results from using a linear polynomial that meets the least squares criterion.
V = [ t.^0 t ]    # Vandermonde-ish matrix
c = V\y
p = Polynomial(c)
f = s -> p(s-1955)
scatter(year,y,label="data",
    xlabel="year",ylabel="anomaly (degrees C)",leg=:bottomright)
plot!(f,1955,2000,label="linear fit")
If we use a global cubic polynomial, the points are fit more closely.
V = [ t[i]^j for i=1:length(t), j=0:3 ]   # Vandermonde-ish matrix
p = Polynomial( V\y )
f = s -> p(s-1955)
plot!(f,1955,2000,label="cubic fit")
If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.
Finding numerical approximations to $\pi$ has fascinated people for millenia. One famous formula is
$$ \frac{\pi^2}{6} = 1 + \frac{1}{2^2} + \frac{1}{3^2} + \cdots. $$Say $s_k$ is the sum of the first terms of the series above, and $p_k = \sqrt{6s_k}$. Here is a fancy way to compute these sequences in a compact code.
a = [1/k^2 for k=1:100] 
s = cumsum(a)        # cumulative summation
p = @. sqrt(6*s)
plot(1:100,p,m=(:o,2),leg=:none,xlabel=L"k",ylabel=L"p_k",title="Sequence convergence")
This graph suggests that $p_k\to \pi$ but doesn't give much information about the rate of convergence. Let $\epsilon_k=|\pi-p_k|$ be the sequence of errors. By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.
ep = @. abs(pi-p)    # error sequence
scatter(1:100,ep,
    leg=:none,xaxis=(:log10,L"k"),yaxis=(:log10,"error"),title="Convergence of errors")
This suggests a power-law relationship where $\epsilon_k\approx a k^b$, or $\log \epsilon_k \approx b (\log k) + \log a$.
k = 1:100
V = [ k.^0 log.(k) ]     # fitting matrix
c = V \ log.(ep)         # coefficients of linear fit
2-element Vector{Float64}:
 -0.18237524972830194
 -0.9674103233127926
In terms of the parameters $a$ and $b$ used above, we have
@show (a,b) = exp(c[1]),c[2];
(a, b) = (exp(c[1]), c[2]) = (0.8332885904225771, -0.9674103233127926)
It's tempting to conjecture that $b\to -1$ asymptotically. Here is how the numerical fit compares to the original convergence curve.
plot!(k,a*k.^b,l=:dash)
Because the functions $\sin^2(t)$, $\cos^2(t)$, and $1$ are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.
t = range(0,stop=3,length=400)
A = [ sin.(t).^2 cos.((1+1e-7)*t).^2 t.^0 ]
kappa = cond(A)
1.8253225422971e7
Now we set up an artificial linear least squares problem with a known exact solution that actually makes the residual zero.
x = [1.,2,1]
b = A*x;
Using backslash to find the solution, we get a relative error that is about $\kappa$ times machine epsilon.
x_BS = A\b
@show observed_err = norm(x_BS-x)/norm(x);
@show max_err = kappa*eps();
observed_err = norm(x_BS - x) / norm(x) = 4.291303437118621e-11 max_err = kappa * eps() = 4.053030227651133e-9
If we formulate and solve via the normal equations, we get a much larger relative error. With $\kappa^2\approx 10^{14}$, we may not be left with more than about 2 accurate digits.
N = A'*A
x_NE = N\(A'*b)
@show observed_err = norm(x_NE-x)/norm(x)
@show digits = -log(10,observed_err)
observed_err = norm(x_NE - x) / norm(x) = 0.016675644280325516 digits = -(log(10, observed_err)) = 1.777917377682191
1.777917377682191
Julia provides access to both the thin and full forms of the QR factorization.
A = rand(1.:9.,6,4)
@show m,n = size(A);
(m, n) = size(A) = (6, 4)
Here is a standard call:
Q,R = qr(A);
Q
6×6 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}:
 -0.0836242   0.955638     0.059977   -0.248946  -0.111609   0.0416588
 -0.167248    0.0567385   -0.0112245   0.408331  -0.565018  -0.694769
 -0.418121    0.141846    -0.665818    0.502622   0.157327   0.290449
 -0.334497   -0.234249    -0.233057   -0.494378  -0.661083   0.312223
 -0.752618   -0.0924026    0.164053   -0.310138   0.42072   -0.353446
 -0.334497   -0.00243165   0.686829    0.421982  -0.171786   0.456939
R
4×4 Matrix{Float64}:
 -11.9583  -9.03141  -11.2893   -6.68994
   0.0      8.62749    2.90258   3.0809
   0.0      0.0        4.70399  -1.3748
   0.0      0.0        0.0       1.69197
If you look carefully, you see that we got a full $Q$ but a thin $R$. Moreover, the $Q$ above is not a standard matrix type. If you convert it to a true matrix, then it reverts to the thin form.
Matrix(Q)
6×4 Matrix{Float64}:
 -0.0836242   0.955638     0.059977   -0.248946
 -0.167248    0.0567385   -0.0112245   0.408331
 -0.418121    0.141846    -0.665818    0.502622
 -0.334497   -0.234249    -0.233057   -0.494378
 -0.752618   -0.0924026    0.164053   -0.310138
 -0.334497   -0.00243165   0.686829    0.421982
We can test that $\mathbf{Q}$ is orthogonal.
QTQ = Q'*Q
norm(QTQ - I)
1.2121972803000621e-15
The thin $Q$ cannot be an orthogonal matrix, because it is not even square, but it is still ONC.
Matrix(Q)'*Matrix(Q) - I
4×4 Matrix{Float64}:
 -1.11022e-16  -1.80246e-16   5.18364e-17  -4.05087e-17
 -1.80246e-16   6.66134e-16   2.62745e-17  -8.7609e-17
  5.18364e-17   2.62745e-17   2.22045e-16  -2.31731e-16
 -4.05087e-17  -8.7609e-17   -2.31731e-16  -3.33067e-16
We will use Householder reflections to produce a QR factorization of the matrix
A = rand(1.:9.,6,4)
m,n = size(A)
(6, 4)
Our first step is to introduce zeros below the diagonal in column 1. Define the vector
z = A[:,1];
Applying the Householder definitions gives us
v = z - norm(z)*[1;zeros(m-1)]
P = I - 2/(v'*v)*(v*v')   # reflector
6×6 Matrix{Float64}:
 0.479808   0.479808    0.059976    0.359856    0.419832    0.479808
 0.479808   0.557441   -0.0553199  -0.33192    -0.38724    -0.442559
 0.059976  -0.0553199   0.993085   -0.0414899  -0.0484049  -0.0553199
 0.359856  -0.33192    -0.0414899   0.75106    -0.29043    -0.33192
 0.419832  -0.38724    -0.0484049  -0.29043     0.661165   -0.38724
 0.479808  -0.442559   -0.0553199  -0.33192    -0.38724     0.557441
(Julia automatically fills in an identity of the correct size for the I above.) By design we can use the reflector to get the zero structure we seek:
P*z
6-element Vector{Float64}:
 16.67333200053307
 -1.7763568394002505e-15
 -2.220446049250313e-16
 -8.881784197001252e-16
 -1.3322676295501878e-15
 -1.7763568394002505e-15
Now we let
A = P*A
6×4 Matrix{Float64}:
 16.6733       13.6745       8.81647   12.415
  0.0          -3.00111      0.324543   3.85008
 -2.22045e-16   0.999862     6.79057    3.60626
 -8.88178e-16   2.99917      1.74341    2.63756
 -1.33227e-15  -3.00097      4.53398    0.243822
 -1.77636e-15  -0.00110618  -0.675457  -0.149917
We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. To put it another way, we can repeat the process we just did on the smaller submatrix
A[2:m,2:n]
5×3 Matrix{Float64}:
 -3.00111      0.324543   3.85008
  0.999862     6.79057    3.60626
  2.99917      1.74341    2.63756
 -3.00097      4.53398    0.243822
 -0.00110618  -0.675457  -0.149917
z = A[2:m,2];
v = z - norm(z)*[1;zeros(m-2)];
P = I - 2/(v'*v)*(v*v');
We now apply the reflector to the submatrix.
A[2:m,2:n] = P*A[2:m,2:n]
A
6×4 Matrix{Float64}:
 16.6733       13.6745        8.81647   12.415
  0.0           5.29218      -0.48395   -0.145457
 -2.22045e-16   3.48636e-16   6.88804    4.08797
 -8.88178e-16   9.47002e-16   2.03579    4.0825
 -1.33227e-15  -6.26198e-16   4.24142   -1.20198
 -1.77636e-15  -3.62763e-19  -0.675565  -0.15045
We need two more iterations of this process.
for j = 3:n
    z = A[j:m,j]
    v = z - norm(z)*[1;zeros(m-j)]
    P = I - 2/(v'*v)*(v*v')
    A[j:m,j:n] = P*A[j:m,j:n]
end
We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:
R = A
6×4 Matrix{Float64}:
 16.6733       13.6745        8.81647      12.415
  0.0           5.29218      -0.48395      -0.145457
 -2.22045e-16   3.48636e-16   8.36873       3.76076
 -8.88178e-16   9.47002e-16  -9.95674e-16   4.54999
 -1.33227e-15  -6.26198e-16  -8.48914e-16  -2.22945e-16
 -1.77636e-15  -3.62763e-19   6.80235e-17  -2.17005e-16