%% CHEBFUN GUIDE 6: QUASIMATRICES AND LEAST-SQUARES
% Lloyd N. Trefethen, November 2009, revised Feburary 2011
%% 6.1 Quasimatrices and "spy"
% A chebfun can have more than one column, or if it is transposed, it can
% have more than one row. In these cases we get a
% *quasimatrix*, a "matrix" in which one of the dimensions is
% discrete as usual but the other is continuous. Our default choice
% will be that of an "infinity by n" quasimatrix consisting of n columns,
% each of which is a chebfun. When it is important to specify the orientation
% we use the term *column quasimatrix* or *row quasimatrix*.
%%
% Here for example is the quasimatrix consisting of the first
% six powers of x on the interval [-1,1]. The command "size" can be
% used to identify the continuous dimension, and to find the numbers
% of rows or columns:
x = chebfun('x');
A = [1 x x.^2 x.^3 x.^4 x.^5];
size(A)
size(A,2)
%%
% Here is the third column of A evaluated at the point x=0.5:
A(0.5,3)
%%
% Here are the column sums, i.e., the integrals of 1, x,..., x^5
% from -1 to 1:
format short, sum(A), format long
%%
% And here is a plot of the columns:
plot(A), grid on
%%
% The term quasimatrix comes from [Stewart 1998], and the
% same idea appears with different terminology in [de Boor 1991] and
% [Trefethen & Bau 1997, pp. 52-54]. The idea is a natural one,
% since so much of applied linear algebra
% deals with discrete approximations to the continuous, but it seems not to have
% been discussed explicitly very much until the appearance of
% Chebfun [Battles & Trefethen 2004, Battles 2006].
%%
% If f and g are column chebfuns, then f'*g is a scalar, their inner product.
% For example, here is the inner product of x^2 and x^4 over [-1,1] (equal to 2/7):
A(:,3)'*A(:,5)
%%
% More generally, if A and B are column quasimatrices with m and n columns, respectively,
% then A'*B is the m x n matrix of inner products of those columns. Here is the
% 6x6 example corresponding to B=A:
format short, A'*A, format long
%%
% You can get an idea of the shape of a quasimatrix with the overloaded SPY command
subplot(1,2,1), spy(A), title A
subplot(1,2,2), spy(A'), title('A''')
%% 6.2 Backslash and least-squares
%%
% In Matlab, the command c = A\b computes the solution to the system
% of equations Ac = b if A is a square matrix, whereas if A is rectangular,
% with more rows than columns, it computes the least squares solution,
% the vector c that minimizes norm(A*c-b). A quasimatrix is always rectangular, and
% "\" has accordingly been overloaded
% to carry out the appropriate continuous least-squares computation.
% (The actual Matlab command that handles backslash is called mldivide.)
%%
% For example, continuing with the same chebfun x and
% quasimatrix A as above, consider the following sequence:
f = exp(x).*sin(6*x);
c = A\f
%%
% The vector c can be interpreted as the vector of
% coefficients of the least-squares fit to f by a linear combination
% of the functions 1, x,..., x^5. Here is a plot of f (in blue) and
% the least-squares approximation (in red), which we label ffit.
ffit = A*c;
clf, plot(f), grid on, hold on
plot(ffit,'r'), hold off
error = norm(f-ffit)
%%
% It is a general result that the least-squares approximation
% by a polynomial of degree n to a continuous function f must intersect
% f at least n+1 times in the interval of approximation.
%
%%
% Here is quite a different quasimatrix whose columns can be used to fit f.
% The columns correspond to hat functions located at points equally
% spaced from -1 to 1, and they are realized as piecewise smooth chebfuns.
A2 = [];
for j = 0:8
xj = -1 + j/4;
A2 = [A2 max(0,1-4*abs(x-xj))];
end
plot(A2)
set(gca,'xtick',-1:.25:1)
%%
% A linear combination of these columns is a piecewise linear function
% with breakpoints at -0.75, -0.50,...,0.75. Here is the least-squares
% fit by such functions to f. Remember that although we happen to be
% fitting here by a function with a discrete flavor, all the operations
% are continuous ones involving integrals, not point evaluations.
c = A2\f;
ffit = A2*c;
plot(f), grid on, hold on
plot(ffit,'.-r'), hold off
set(gca,'xtick',-1:.25:1)
error = norm(f-ffit)
%% 6.3 QR factorization
%%
% Matrix least-squares problems are ordinarily solved by QR factorization,
% and in the quasimatrix case, they are solved by quasimatrix QR factorization.
% This is the technology underlying the backslash operator described in the
% last section.
%%
% A quasimatrix QR factorization takes this form:
%
% A = QR, where A is inf x n, Q is inf x n, R is n x n.
%
% The columns of A are arbitrary, the columns of Q are orthonormal, and
% R is an n x n upper-triangular matrix. This factorization corresponds
% to what is known in various texts as the "reduced", "economy size", "skinny",
% "abbreviated", or "condensed"
% QR factorization, since Q is rectangular rather than square and R is
% square rather than rectangular. In Matlab the syntax for computing such things is
% [Q,R] = qr(A,0), and the same command has been overloaded for chebfuns. The
% computation makes use of a quasimatrix analogue of Householder triangularization
% [Trefethen 2010]. Alternatively one can simply write [Q,R] = qr(A):
[Q,R] = qr(A);
plot(Q), grid on
%%
% The spy command confirms the shape of these various matrices.
% One sees fewer dots in the spy plot than one may expect at first,
% the reason being that half its entries in the upper-triangle should
% be zero because of the fact that
% the columns of A alternate even and odd functions.
% In fact, because of rounding or truncation errors, not all those mathematical
% zeros are zero on the computer, so the upper half of
% spy (R) appears as an imperfect checkerboard.
subplot(1,3,1), spy(A), title A
subplot(1,3,2), spy(Q), title Q
subplot(1,3,3), spy(R), title R
%%
% The plot shows *orthogonal polynomials*, namely the orthogonalizations of
% the monomials 1, x,...,x^5 over [-1,1]. These are the famous Legendre
% polynomials {P_k} [Abramowitz & Stegun 1972],
% except that the latter are conventionally normalized by the
% condition P(1) = 1 rather than by having norm 1. We can renormalize to
% impose this condition as follows:
for j = 1:size(A,2)
R(j,:) = R(j,:)*Q(1,j);
Q(:,j) = Q(:,j)/Q(1,j);
end
clf, plot(Q), grid on
%%
% (A slicker way to produce this plot in Chebfun would be
% simply to type plot(legpoly(0:5)).)
%%
% If A=QR, then A*inv (R) = Q, and here is inv (R):
format short, inv(R), format long
%%
% The kth column of inv (R) is the vector of coefficients for the
% expansion of the kth column of Q as a linear combination of the
% columns of A, that is, the monomials 1, x, x^2,.... In other words,
% the kth column of inv (R) is the vector of coefficients of the kth
% Legendre polynomial. For example, we see from the matrix
% that P_3(x) = 2.5x^3 - 1.5x.
%%
% Here is what the hat functions look like after orthonormalization:
[Q2,R2] = qr(A2);
plot(Q2)
%% 6.4 SVD, norm, cond
%%
% An m x n matrix A defines a map from R^n to R^m, and in particular, A maps the unit ball in
% R^n to a hyperellipsoid of dimension <=n in R^m. The (reduced, skinny, condensed,...)
% *SVD* or *singular value decomposition* exhibits this
% map by providing a factorization AV = US or equivalently
% A = USV', where U is m x n with orthonormal columns, S is diagonal with
% nonincreasing nonnegative diagonal entries known as the *singular values*,
% and V is n x n and orthogonal.
% A maps v_j, the jth column of V or jth *right singular vector*, to s_j times
% u_j, the jth column of U or jth *left singular vector*, which is the vector
% defining the jth largest semiaxis of the hyperellipsoid. See Chapters 4 and 5
% of [Trefethen & Bau 1997].
%%
% If A is an inf x n quasimatrix, everything is analogous:
%
% A = USV', where A is inf x n, U is inf x n, S is n x n, V is n x n.
%
% The image of the unit ball in R^n under A is still a hyperellipsoid of dimension <=n, which now
% lies within an infinite-dimensional function space.
% The columns of Q are orthonormal functions and S and V have the same properties
% as in the matrix case.
%%
% For example, here are the singular values of the matrix
% A defined earlier with columns 1,x,...,x^5:
s = svd(A,0)
%%
% The largest singular value is equal to the norm of the quasimatrix, which
% is defined by norm(A,2) = max_x norm(A*x)/norm(x).
norm(A,2)
%%
% (Note that we must include the argument "2" here: for reasons of speed,
% the default for quasimatrices, unlike the usual Matlab matrices, is
% the Frobenius norm rather than the 2-norm.)
% The SVD enables us to identify exactly what vectors are involved in achieving
% this maximum ratio. The optimal vector x is v1, the first right singular vector of A,
[U,S,V] = svd(A);
%%
% First we use spy to confirm the shapes of the matrices. As with spy (R) earlier, here
% spy(V) should in principle show a checkerboard, but some of its blanks are turned into
% dots by rounding or truncation errors.
subplot(1,4,1), spy(A), title A
subplot(1,4,2), spy(U), title U
subplot(1,4,3), spy(S), title S
subplot(1,4,4), spy(V), title V
v1 = V(:,1)
%%
% Next we confirm that the norm of v1 is 1:
norm(v1)
%%
% This vector is mapped by A to the chebfun s1*u1:
u1 = U(:,1);
norm(u1)
%%
s1 = S(1,1)
%%
norm(A*v1)
%%
norm(A*v1-s1*u1)
%%
% Similarly, the minimal singular value and corresponding singular vectors
% describe the minimum amount that A can enlarge an input. The following
% commands plot the extreme functions A*v1 (blue) and A*vn (red). We can
% interpret these as the largest and smallest degree-5 polynomials, as
% measured in the 2-norm over [-1,1], whose coefficient vectors have 2-norm equal to 1.
clf, plot(A*v1), grid on, hold on
vn = V(:,end);
plot(A*vn,'r'), hold off
%%
% The ratio of the largest and smallest singular values -- the
% eccentricity of the hyperellipsoid -- is the condition number of A:
max(s)/min(s)
%%
cond(A)
%%
% The fact that cond(A) is a good deal greater than 1 reflects the ill-conditioning
% of the monomials 1,x,...,x^5 as a basis for polynomials in [-1,1]. The effect
% becomes rapidly stronger as we take more terms in the sequence:
cond([1 x x.^2 x.^3 x.^4 x.^5 x.^6 x.^7 x.^8 x.^9 x.^10])
%%
% By contrast a quasimatrix formed of suitably normalized Legendre polynomials has
% condition number 1, since they are orthonormal:
cond(legpoly(0:10,'norm'))
%%
% A quasimatrix of Chebyshev polynomials doesn't quite achieve
% condition number 1, but it comes close:
cond(chebpoly(0:10))
%%
% Chebyshev polynomials form an excellent basis for expansions on [-1,1], a fact
% that is the starting point of Chebfun.
%% 6.5 Other norms
%%
% The definition norm(A) = max_x norm(A*x)/norm(x)
% makes sense in other norms besides the 2-norm, and the particularly
% important alternatives are the 1-norm and the inf-norm. The 1-norm of
% a column quasimatrix is the "maximum column sum", i.e., the maximum of
% the 1-norms of its columns. In the case of our quasimatrix A,
% the maximum is attained by the first column, which has norm 2:
norm(A,1)
%%
% The inf-norm is the "maximum row sum", which for a column quasimatrix
% corresponds to the maximum of the chebfun obtained by adding the
% absolute values of the columns. In the case of A, the sum is
% 1+|x|+...+|x|^5, which attains its maximum value 6 at x=-1 and 1:
norm(A,inf)
%%
% The norms of row quasimatrices are analogous, with
% norm(A',inf) = norm(A,1) and norm(A',1) = norm(A,inf).
% Like Matlab itself applied to a rectangular matrix, Chebfun
% does not define cond(A,1) or cond(A,inf) if A is a quasimatrix.
%%
% The Frobenius or Hilbert-Schmidt norm is equal to the square root of the sum of the
% squares of the singular values:
norm(A,'fro')
%% 6.6 rank, null, orth, pinv
%%
% Chebfun also contains overloads for some further
% Matlab operations related to orthogonal matrix factorizations.
% Perhaps the most useful of these is rank(A), which computes the singular
% values of A and makes a judgement as to how many of them are significantly
% different from zero. For example, with x still defined as before, here is an
% example showing that the functions 1, sin(x)^2, and cos(x)^2 are linearly dependent:
B = [1 sin(x).^2 cos(x).^2];
rank(B)
%%
% Since B is rank-deficient, is has a nontrivial nullspace, and the command
% null(B) will find an orthonormal basis for it:
null(B)
%%
% Similarly the command orth(B) finds an orthonormal basis for the range of B,
% which in this case has dimension 2:
orth(B)
%%
% If A is an inf x n column quasimatrix, the command pinv(A) computes the pseudoinverse
% of A, an n x inf row quasimatrix such that pinv(A)*c = A\c.
%%
% Here is a summary of the dimensions of these objects:
subplot(1,3,1), spy(null(B)), title null(B)
subplot(1,3,2), spy(orth(B)), title orth(B)
subplot(1,3,3), spy(pinv(A)), title pinv(A)
%% 6.7 References
%
% [Abramowitz & Stegun 1972]
% M. A. Abramowitz and I. A. Stegun, eds., Handbook of Mathematical Functions with
% Formulas, Graphs, and Mathematical Tables, 9th printing, Dover, 1972.
%
% [Battles 2006] Z. Battles, Numerical Linear Algebra for
% Continuous Functions, DPhil thesis, Oxford University
% Computing Laboratory, 2006.
%
% [Battles & Trefethen 2004] Z. Battles and L. N. Trefethen,
% "An extension of Matlab to continuous functions and
% operators", SIAM Journal on Scientific Computing 25 (2004),
% 1743-1770.
%
% [de Boor 1991] C. de Boor, "An alternative approach to (the teaching
% of) rank, basis, and dimension", Linear Algebra and its Applications
% 146 (1991), 221-229.
%
% [Stewart 1998] G. W. Stewart, Afternotes Goes to Graduate School:
% Lectures on Advanced Numerical Analysis, SIAM, 1998.
%
% [Trefethen 2008] L. N. Trefethen, "Householder triangularization of
% a quasimatrix", IMA Journal on Numerical Analysis 30 (2010), 887-897.
%
% [Trefethen & Bau 1997] L. N. Trefethen and D. Bau, III, Numerical Linear
% Algebra, SIAM, 1997.