M371 - Alexiades
Review4b: for FINAL - Least Squares
Material
Least Squares approximation / fitting
Fourier expansions, FFT, signal processing
Concepts for LS
Least Squares fitting of m data points
(xi,yi), i=1:m to an
n-parameter model Φ(x; c1,...,cn):
find c=(c1,...,cn) to minimize the LS error:
E(c) = ∑i=1m [ yi − Φ(xi, c) ]2
= sum of squared deviations
( ≡ ∥y−Φ∥22 ).
Least Squares approximation of a function f(x) on [a,b] by an n-parameter model Φ(x,c):
find c=(c1,...,cn) to minimize the LS error
E(c) = a∫b| f(x) − Φ(x,c) |2dx
( ≡ ∥f−Φ∥2 ).
It is a (generally nonlinear) optimization problem, and typically m>>n.
It can be viewed as a (generally nonlinear) system for c: ∇E(c) = 0 (normal equations) ,
typically ill-conditioned.
Linear Least Squares problem: fit to a "linear model"
Φ(x,c) = ∑k=1n ckφk(x),
i.e. to a linear combination of n basis functions
{φk(x)}k=1,...,n .
e.g. choosing the powers xk as basis
functions φk(x)=xk, k=0,1,...,n-1,
amounts to fitting an
n-degree polynomial to the data (and leads to ill-conditioned system!).
L2(a,b) space
= space of all square-integrable functions defined on (a,b)
with norm ∥f∥ =
( a∫b|f(x)|2dx )½
L2(a,b) inner product:
<f , g> = a∫b f(x)g(x) dx ,
norm: ∥f∥ = <f , f>½
Orthonormal set {φk}k=1,...,n
in L2(a,b): <φk ,
φj> = δkj
(=0 if k≠j , =1 if k=j ).
Complete orthonormal set (orthonormal basis)
{φk}k=1,2,... in L2:
<f , φk>=0 ∀k implies f=0.
Most imprortant orthogonal bases:
* the standard orthogonal polynomials (Legendre,Chebyshev,Laguerre,Hermite) in Lw2(a,b)
on appropriate intervals (a,b) with appropriate weights w.
* In L2(0,L):
{ cos(kπx/L) }k=0,1,2,... , and
{ sin(kπx/L) }k=1,2,... .
* In L2(−L,L):
{ 1, cos(kπx/L) , sin(kπx/L) }k=1,2,...
, or { exp(i kπx/L }k=0,±1,±2,...
* eigenfunctions of self-adjoint Sturm-Liouville problems
* orthogonal wavelets (of various types, e.g. Daubechies wavelets)
Fundamental Theorem on (generalized) Fourier expansions:
For an orthonormal set {φk} in a Hilbert space H,
the following are equivalent:
1. The orthonormal set {φk} is complete (constitutes an orthonormal basis).
2. Each f ∈ H has unique Fourier expansion w.r.t. {φk}: f = ∑ckφk ,
with Fourier coefficients ck = < f , φk > .
3. For each f ∈ H, ∑|ck|2 = ∥f∥2 (Parseval equality, Pythagorean Thm).
Trigonometric Fourier expansions:
The most important, by far, are classical Fourier Series,
which decompose f(x) in terms of sines and cosines,
hence in terms of frequencies.
Complex notation is more convenient: orthogonal basis
{φk(x)} =
{ ei kπx/L }k=0,±1,±2,...
Fourier Series of f ∈ L2(−L ,L):
f(x) ∼ ∑ ck ei kπx/L
with Fourier coefficients
ck = 1/2L −L∫L f(x)
e−i kπx/L dx = amplitude at frequency k ,
k=0,±1,±2,...
The mapping f → {ck}k=0,±1,±2,...
is the Discrete Fourier Transform of
f ∈ L2(−L , L)
The Inverse DFT reconstructs f from its Fourier coefficients {ck}: ∑ ck ei kπx/L = f
(convergence in L2-sense).
The series will also converge pointwise (at each x) if the
coefficients {ck} decay fast enough.
Any truncation of the Fourier series (any partial sum) provides
an approximation to f.
Methods for LS fitting
For small m (and small n≤m) problems,
normal equations: ∇E(c)=0 can be solved "by hand".
For (very) big m and big n, it can be a serious problem...
generally solved by numerical optimization methods.
A general principle is to choose orthogonal basis functions for linear LS fitting.
Linear Least Squares fitting:
Setting
y = [y1 . . . ym]T ,
c = [c1 . . . cm]T ,
and A = [aik] = [ φk(xi) ] ( m×n matrix ),
the linear LS data fitting problem can be written as:
min ∥y − Ac∥22
( square of 2-norm in ℝm ).
Some neat linear algebra leads to the
normal equations: ATAc = ATy (n×n system).
This linear system is often ill-conditioned;
it is solved via QR factorization of A (which orthogonalizes A).
Overdetermined linear systems:
The linear LS problem, min ∥y − Ac∥22
can be viewed as:
find a vector c to minimize the "residual"
y−Ac of the linear system
Ac "=" y.
However, since m>n, this is an overdetermined system and thus has no solution!
We interpret it as Ac ≈ y
in Least Squares sense;
thus the LS solution gives meaning to "solving"
an overdetermined system, and defines a concept of "generalized inverse" for non-square matrices.
This is what Matlab's "backslash" operator does:
x=A\b produces a Least Squares solution of Ax=b,
even for non-square A ! It uses QR factorization of A.
Best way to treat LS problems is to choose orthonormal basis functions,
then there is no system to solve!
The coefficients are
the Fourier components of f: ck = < f , φk > k=1,...,n.
[The rest is for your information only... don't need to learn the formulas...]
Fourier Tranforms
represent a function in terms of frequencies (in frequency domain). Various versions are in use:
Fourier Integral of f(x) defined in (−∞ , ∞):
F(ν) = −∞∫∞
f(x)ei2πνxdx, −∞<ν<∞
Inverse transform: f(x) = −∞∫∞ F(ν)e−i2πνxdν, −∞<x<∞ (reconstructs f(x) from its Fourier components F(ν) ).
Discrete Fourier Transform of f(x) defined in (−L ,L)
is a sequence of numbers {fk}k=0,±1,±2,...,
fk= Fourier component of f(x) at frequency νk=k/2L:
fk = (1/2L) −L∫Lf(x)eikπx/Ldx , k=0,±1,±2,...
Note that f0 = mean value of f(x) on (−L,L)
(known as DC component of the signal).
Inverse transform: f(x) = ∑k=-∞∞ fke−ikπx/L, −L<x<L,
(= 2L-periodic extension of f on −∞<x<∞),
also known as Fourier Series expansion of f,
reconstructs f(x) from its Fourier components {fk}.
Finite Fourier Transform of a finite sequence {yj}j=0,1,...,N-1 is the finite sequence {Yk}:
Yk = ∑j=0N-1 yj ei2πjk/N, k=0,1,...,N-1
Inverse transform:
yj = ∑k=0N-1 Yke−i2πjk/N , j=0,1,...,N-1 , reconstructs {yj} from its Fourier components {Yk}.
Fast Finite Fourier Transform (FFT) implements the computation of a Finite FT of length N=2p
in only O(N p) = O(N log2N) operations instead of O(N2), tremendous saving for large N
( e.g. for N=210=1024: N2=1,048,576 ≈ 106 but Np=10,240 ≈ 104 ! )
( e.g. for N=220=1,048,576: N2=1,099,511,627,776 ≈ 1012 but Np=20,971,520 ≈ 20*106 ! ).
The FFT makes digital signal processing practical in real time, widely used in today's technology:
1. signal is sampled (digitized) into {yj},
2. transformed to frequency domain via FFT: {Yk},
3. manipulated (filtered, denoised, compressed, etc),
as illustrated in Lab8,
4. {Yk} is transmitted, and
5. {yj} is reconstructed via Inverse FFT by the receiver!
JPEG standard for digital images:
uses the Discrete Cosine Transform (DCT) and its inverse,
on 8x8 blocks of pixels.
JPEG codec specifies how an image is compressed into a stream of bytes and decompressed back into an image,
Main steps:
Color space transformation,
Downsampling,
Block splitting,
Discrete Cosine Transform,
Quantization,
Entropy coding.
It provides lossy compression (typically 10:1).
The JPEG standard was established in 1992.
see
Wikipedia article