final

M371 - Alexiades Review4b: for FINAL - Least Squares Material

Least Squares approximation / fitting

Fourier expansions, FFT, signal processing

Concepts for LS

Least Squares fitting of m data points (x_i,y_i), i=1:m to an n-parameter model Φ(x; c₁,...,c_n):
find c=(c₁,...,c_n) to minimize the LS error: E(c) = ∑_i=1^m [ y_i − Φ(x_i, c) ]² = sum of squared deviations ( ≡ ∥y−Φ∥₂² ).

Least Squares approximation of a function f(x) on [a,b] by an n-parameter model Φ(x,c):
find c=(c₁,...,c_n) to minimize the LS error E(c) = _a∫^b| f(x) − Φ(x,c) |²dx ( ≡ ∥f−Φ∥² ).

It is a (generally nonlinear) optimization problem, and typically m>>n.

It can be viewed as a (generally nonlinear) system for c: ∇E(c) = 0 (normal equations) , typically ill-conditioned.

Linear Least Squares problem: fit to a "linear model" Φ(x,c) = ∑_k=1ⁿ c_kφ_k(x),
i.e. to a linear combination of n basis functions {φ_k(x)}_k=1,...,n .
e.g. choosing the powers x^k as basis functions φ_k(x)=x^k, k=0,1,...,n-1, amounts to fitting an
n-degree polynomial to the data (and leads to ill-conditioned system!).

L²(a,b) space = space of all square-integrable functions defined on (a,b) with norm ∥f∥ = ( _a∫^b|f(x)|²dx )^½

L²(a,b) inner product: <f , g> = _a∫^b f(x)g(x) dx , norm: ∥f∥ = <f , f>^½

Orthonormal set {φ_k}_k=1,...,n in L²(a,b): <φ_k , φ_j> = δ_kj (=0 if k≠j , =1 if k=j ).

Complete orthonormal set (orthonormal basis) {φ_k}_k=1,2,... in L²: <f , φ_k>=0 ∀k implies f=0.

Most imprortant orthogonal bases:
* the standard orthogonal polynomials (Legendre,Chebyshev,Laguerre,Hermite) in L_w²(a,b)
on appropriate intervals (a,b) with appropriate weights w.
* In L²(0,L): { cos(kπx/L) }_k=0,1,2,... , and { sin(kπx/L) }_k=1,2,... .
* In L²(−L,L): { 1, cos(kπx/L) , sin(kπx/L) }_k=1,2,... , or { exp(i kπx/L }_{k=0,±1,±2,...}
* eigenfunctions of self-adjoint Sturm-Liouville problems
* orthogonal wavelets (of various types, e.g. Daubechies wavelets)

Fundamental Theorem on (generalized) Fourier expansions:
For an orthonormal set {φ_k} in a Hilbert space H, the following are equivalent:
1. The orthonormal set {φ_k} is complete (constitutes an orthonormal basis).
2. Each f ∈ H has unique Fourier expansion w.r.t. {φ_k}: f = ∑c_kφ_k , with Fourier coefficients c_k = < f , φ_k > .
3. For each f ∈ H, ∑|c_k|² = ∥f∥² (Parseval equality, Pythagorean Thm).

Trigonometric Fourier expansions: The most important, by far, are classical Fourier Series,
which decompose f(x) in terms of sines and cosines, hence in terms of frequencies.
Complex notation is more convenient: orthogonal basis {φ_k(x)} = { e^{i kπx/L} }_{k=0,±1,±2,...}
Fourier Series of f ∈ L²(−L ,L): f(x) ∼ ∑ c_k e^{i kπx/L}
with Fourier coefficients c_k = 1/2L _−L∫^L f(x) e^{−i kπx/L} dx = amplitude at frequency k , k=0,±1,±2,...
The mapping f → {c_k}_{k=0,±1,±2,...} is the Discrete Fourier Transform of f ∈ L²(−L , L)
The Inverse DFT reconstructs f from its Fourier coefficients {c_k}: ∑ c_k e^{i kπx/L} = f (convergence in L²-sense).
The series will also converge pointwise (at each x) if the coefficients {c_k} decay fast enough.
Any truncation of the Fourier series (any partial sum) provides an approximation to f.

Methods for LS fitting

For small m (and small n≤m) problems, normal equations: ∇E(c)=0 can be solved "by hand".
For (very) big m and big n, it can be a serious problem... generally solved by numerical optimization methods.
A general principle is to choose orthogonal basis functions for linear LS fitting.

Linear Least Squares fitting:
Setting y = [y₁ . . . y_m]^T , c = [c₁ . . . c_m]^T , and A = [a_ik] = [ φ_k(x_i) ] ( m×n matrix ),
the linear LS data fitting problem can be written as: min ∥y − Ac∥₂² ( square of 2-norm in ℝ^m ).
Some neat linear algebra leads to the normal equations: A^TAc = A^Ty (n×n system).
This linear system is often ill-conditioned; it is solved via QR factorization of A (which orthogonalizes A).

Overdetermined linear systems:
The linear LS problem, min ∥y − Ac∥₂² can be viewed as:
find a vector c to minimize the "residual" y−Ac of the linear system Ac "=" y.
However, since m>n, this is an overdetermined system and thus has no solution!
We interpret it as Ac ≈ y in Least Squares sense; thus the LS solution gives meaning to "solving"
an overdetermined system, and defines a concept of "generalized inverse" for non-square matrices.
This is what Matlab's "backslash" operator does: x=A\b produces a Least Squares solution of Ax=b,
even for non-square A ! It uses QR factorization of A.

Best way to treat LS problems is to choose orthonormal basis functions, then there is no system to solve!
The coefficients are the Fourier components of f: c_k = < f , φ_k > k=1,...,n.

[The rest is for your information only... don't need to learn the formulas...]
Fourier Tranforms represent a function in terms of frequencies (in frequency domain). Various versions are in use:

Fourier Integral of f(x) defined in (−∞ , ∞): F(ν) = _−∞∫^∞ f(x)e^i2πνxdx, −∞<ν<∞
Inverse transform: f(x) = _−∞∫^∞ F(ν)e^−i2πνxdν, −∞<x<∞ (reconstructs f(x) from its Fourier components F(ν) ).

Discrete Fourier Transform of f(x) defined in (−L ,L) is a sequence of numbers {f_k}_{k=0,±1,±2,...},
f_k= Fourier component of f(x) at frequency ν_k=k/2L: f_k = (1/2L) _−L∫^Lf(x)e^ikπx/Ldx , k=0,±1,±2,...
Note that f₀ = mean value of f(x) on (−L,L) (known as DC component of the signal).
Inverse transform: f(x) = ∑_k=-∞^∞ f_ke^−ikπx/L, −L<x<L, (= 2L-periodic extension of f on −∞<x<∞),
also known as Fourier Series expansion of f, reconstructs f(x) from its Fourier components {f_k}.

Finite Fourier Transform of a finite sequence {y_j}_{j=0,1,...,N-1} is the finite sequence {Y_k}:
Y_k = ∑_j=0^N-1 y_j e^i2πjk/N, k=0,1,...,N-1
Inverse transform: y_j = ∑_k=0^N-1 Y_ke^−i2πjk/N , j=0,1,...,N-1 , reconstructs {y_j} from its Fourier components {Y_k}.

Fast Finite Fourier Transform (FFT) implements the computation of a Finite FT of length N=2^p
in only O(N p) = O(N log₂N) operations instead of O(N²), tremendous saving for large N
( e.g. for N=2¹⁰=1024: N²=1,048,576 ≈ 10⁶ but Np=10,240 ≈ 10⁴ ! )
( e.g. for N=2²⁰=1,048,576: N²=1,099,511,627,776 ≈ 10¹² but Np=20,971,520 ≈ 20*10⁶ ! ).

The FFT makes digital signal processing practical in real time, widely used in today's technology:
1. signal is sampled (digitized) into {y_j},
2. transformed to frequency domain via FFT: {Y_k},
3. manipulated (filtered, denoised, compressed, etc), as illustrated in Lab8,
4. {Y_k} is transmitted, and
5. {y_j} is reconstructed via Inverse FFT by the receiver!
JPEG standard for digital images: uses the Discrete Cosine Transform (DCT) and its inverse, on 8x8 blocks of pixels.
JPEG codec specifies how an image is compressed into a stream of bytes and decompressed back into an image,
Main steps: Color space transformation, Downsampling, Block splitting, Discrete Cosine Transform, Quantization, Entropy coding.
It provides lossy compression (typically 10:1). The JPEG standard was established in 1992. see Wikipedia article