M371 - Alexiades
Review4b: for FINAL - Least Squares
Material
  • Least Squares approximation / fitting
  • Fourier expansions, FFT, signal processing
    Concepts for LS
  • Least Squares fitting of m data points (xi,yi), i=1:m to an n-parameter model Φ(x; c1,...,cn):
        find c=(c1,...,cn) to minimize the LS error: E(c) = i=1m [ yi − Φ(xi, c) ]2 = sum of squared deviations ( ≡ ∥yΦ22 ).
  • Least Squares approximation of a function f(x) on [a,b] by an n-parameter model Φ(x,c):
        find c=(c1,...,cn) to minimize the LS error E(c) = ab| f(x) − Φ(x,c) |2dx   ( ≡ ∥f−Φ∥2 ).
  • It is a (generally nonlinear) optimization problem, and typically m>>n.
  • It can be viewed as a (generally nonlinear) system for c:   ∇E(c) = 0   (normal equations) , typically ill-conditioned.
  • Linear Least Squares problem: fit to a "linear model" Φ(x,c) = ∑k=1n ckφk(x),
      i.e. to a linear combination of n basis functions   {φk(x)}k=1,...,n .
      e.g. choosing the powers xk as basis functions φk(x)=xk, k=0,1,...,n-1, amounts to fitting an
          n-degree polynomial to the data (and leads to ill-conditioned system!).

  • L2(a,b) space = space of all square-integrable functions defined on (a,b) with norm ∥f∥ = ( ab|f(x)|2dx )½
  • L2(a,b) inner product:   <f , g> = ab f(x)g(x) dx ,   norm:   ∥f∥ = <f , f>½
  • Orthonormal setk}k=1,...,n in L2(a,b):   <φk , φj> = δkj   (=0 if k≠j , =1 if k=j ).
  • Complete orthonormal set (orthonormal basis) {φk}k=1,2,... in L2:   <f , φk>=0 ∀k  implies  f=0.

  • Most imprortant orthogonal bases:
      * the standard orthogonal polynomials (Legendre,Chebyshev,Laguerre,Hermite) in Lw2(a,b)
         on appropriate intervals (a,b) with appropriate weights w.
      * In L2(0,L):   { cos(kπx/L) }k=0,1,2,... , and   { sin(kπx/L) }k=1,2,... .
      * In L2(−L,L):   { 1, cos(kπx/L) , sin(kπx/L) }k=1,2,... ,   or   { exp(i kπx/L }k=0,±1,±2,...
      * eigenfunctions of self-adjoint Sturm-Liouville problems
      * orthogonal wavelets (of various types, e.g. Daubechies wavelets)

  • Fundamental Theorem on (generalized) Fourier expansions:
      For an orthonormal set {φk} in a Hilbert space H, the following are equivalent:
      1. The orthonormal set {φk} is complete (constitutes an orthonormal basis).
      2. Each f ∈ H has unique Fourier expansion w.r.t. {φk}:   f = ∑ckφk , with Fourier coefficients  ck = < f , φk > .
      3. For each f ∈ H,   ∑|ck|2 = ∥f∥2   (Parseval equality, Pythagorean Thm).

  • Trigonometric Fourier expansions: The most important, by far, are classical Fourier Series,
      which decompose f(x) in terms of sines and cosines, hence in terms of frequencies.
      Complex notation is more convenient: orthogonal basis {φk(x)} = { ei kπx/L }k=0,±1,±2,...
    Fourier Series of f ∈ L2(−L ,L):   f(x) ∼ ∑ ck ei kπx/L  
      with Fourier coefficients   ck = 1/2L −LL f(x) e−i kπx/L dx = amplitude at frequency k , k=0,±1,±2,...
      The mapping f → {ck}k=0,±1,±2,... is the Discrete Fourier Transform of f ∈ L2(−L , L)
      The Inverse DFT reconstructs f from its Fourier coefficients {ck}:   ∑ ck ei kπx/L = f   (convergence in L2-sense).
      The series will also converge pointwise (at each x) if the coefficients {ck} decay fast enough.
      Any truncation of the Fourier series (any partial sum) provides an approximation to f.
    Methods for LS fitting
  •  For small m (and small n≤m) problems, normal equations: ∇E(c)=0 can be solved "by hand".
      For (very) big m and big n, it can be a serious problem... generally solved by numerical optimization methods.
      A general principle is to choose orthogonal basis functions for linear LS fitting.
  • Linear Least Squares fitting:
      Setting   y = [y1 . . . ym]T , c = [c1 . . . cm]T , and   A = [aik] = [ φk(xi) ]   ( m×n matrix ),
      the linear LS data fitting problem can be written as:   min ∥y − Ac∥22   ( square of 2-norm in ℝm ).
      Some neat linear algebra leads to the normal equations:   ATAc = ATy  (n×n system).
      This linear system is often ill-conditioned; it is solved via QR factorization of A (which orthogonalizes A).
  • Overdetermined linear systems:
      The linear LS problem, min ∥y − Ac∥22 can be viewed as:
        find a vector c to minimize the "residual"   y−Ac   of the linear system   Ac "=" y.
      However, since m>n, this is an overdetermined system and thus has no solution!
      We interpret it as   Acy in Least Squares sense; thus the LS solution gives meaning to "solving"
      an overdetermined system, and defines a concept of "generalized inverse" for non-square matrices.
      This is what Matlab's "backslash" operator does: x=A\b produces a Least Squares solution of Ax=b,
      even for non-square A !   It uses QR factorization of A.
  • Best way to treat LS problems is to choose orthonormal basis functions, then there is no system to solve!
      The coefficients are the Fourier components of f:   ck = < f , φk >   k=1,...,n.
    [The rest is for your information only... don't need to learn the formulas...]
    Fourier Tranforms represent a function in terms of frequencies (in frequency domain). Various versions are in use:
  • Fourier Integral of f(x) defined in (−∞ , ∞):   F(ν) = −∞ f(x)ei2πνxdx, −∞<ν<∞
      Inverse transform: f(x) = −∞ F(ν)ei2πνxdν, −∞<x<∞   (reconstructs f(x) from its Fourier components F(ν) ).
  • Discrete Fourier Transform of f(x) defined in (−L ,L) is a sequence of numbers {fk}k=0,±1,±2,...,
      fk= Fourier component of f(x) at frequency νk=k/2L:   fk = (1/2L) −LLf(x)eikπx/Ldx , k=0,±1,±2,...
    Note that f0 = mean value of f(x) on (−L,L) (known as DC component of the signal).
      Inverse transform: f(x) = ∑k=-∞ fkeikπx/L, −L<x<L,  (= 2L-periodic extension of f on −∞<x<∞),
        also known as Fourier Series expansion of f, reconstructs f(x) from its Fourier components {fk}.
  • Finite Fourier Transform of a finite sequence {yj}j=0,1,...,N-1 is the finite sequence {Yk}: 
                   Yk = ∑j=0N-1 yj ei2πjk/N,   k=0,1,...,N-1
      Inverse transform: yj = ∑k=0N-1 Ykei2πjk/N , j=0,1,...,N-1 , reconstructs {yj} from its Fourier components {Yk}.
  • Fast Finite Fourier Transform (FFT) implements the computation of a Finite FT of length N=2p
      in only O(N p) = O(N log2N) operations instead of O(N2), tremendous saving for large N
          ( e.g. for N=210=1024:   N2=1,048,576 ≈ 106 but Np=10,240 ≈ 104 ! )
            ( e.g. for N=220=1,048,576:   N2=1,099,511,627,776 ≈ 1012 but Np=20,971,520 ≈ 20*106 ! ).

    The FFT makes digital signal processing practical in real time, widely used in today's technology:
      1. signal is sampled (digitized) into {yj},
      2. transformed to frequency domain via FFT: {Yk},
      3. manipulated (filtered, denoised, compressed, etc), as illustrated in Lab8,
      4. {Yk} is transmitted, and
      5. {yj} is reconstructed via Inverse FFT by the receiver!
    JPEG standard for digital images: uses the Discrete Cosine Transform (DCT) and its inverse, on 8x8 blocks of pixels.
      JPEG codec specifies how an image is compressed into a stream of bytes and decompressed back into an image,
      Main steps: Color space transformation, Downsampling, Block splitting, Discrete Cosine Transform, Quantization, Entropy coding.
      It provides lossy compression (typically 10:1). The JPEG standard was established in 1992. see Wikipedia article