Errors - Finite Precision - Roundoff
Machine numbers (floating point numbers)
number of bits for Mantissa determines the presicion,
number of bits for (biased) Exponent determines the range
of representable numbers.
e.g. For x=8.3456: fl(x)=.83456×101.
On a 3-digit decimal machine: fl(x)=.835×10=8.35
rel.error 0.05%
NOTE: x=0.1 is NOT a machine number! so fl(0.1) ≠ 0.1 !!!
Computations leading to an undefined value (like 0/0, Inf-Inf)
result in NaN (Not-a-Number)
Writing a computed value to a variable forces rounding
to machine precision.
IEEE 754 Standard (1985)
Single Precision (SP) : 32 bit word: sign, 8 bits for biased Exponent, 23 bits for Mantissa
Double Precision (DP): 64 bit word: sign, 11 bits for biased Exponent, 53 bits for Mantissa
.
precision
lower
expoupper
exposmallest positive #
largest #
number of
machine #sεmach
# of reliable
decimal digits
SP
23 bits
−126
+127
2−126 ≈ 1.2 × 10−38
≈ 3.4 × 10+38
≈ 4 × 109
2−24 ≈ 6 × 10−8
7
DP
53 bits
−1022
+1023
2−1022 ≈ 2.2 × 10−308
≈ 1.8 × 10+308
≈ 2 × 1019
2−53 ≈ 1.1 × 10−16
15
Machine epsilon εmach
so the larger the number the larger the roundoff error!
So, any (positive) number less than eps is effectively zero in computation!
Important consequencies:
1. Inherent error in representing real numbers
(by nearest machine number),
called round-off error .
2. Loss of significant digits in arithmetic may occur.
Especially dangerous situations are:
• subtractive cancellation:
when subtracting nearly equal reals (try to avoid...)
• negligible addition:
when adding a very small to a large real number
• unbalanced multiplication:
when nultiplying by a very small number
(or dividing by a very large number).
3. Relative error in addition or subtraction is ≤ 2εmach ;
in multiplication or division is ≤ 3εmach.
After some millions of these, precision may be lost...
but it can be much worse in the cases shown in 2. above.
4. Machine arithmetic is not necessarily associative,
i.e. grouping of operations may affect the result!
e.g. if a=0.99εmach then (1+a)+a=1+a = 1
but 1+(a+a)=1+2a > 1 !!!
5. Should NEVER ask for equality of computed real numbers,
use if( abs(a − b) ≤ TOL ) , for some TOLerance
( with TOL no smaller than 1.e-15, for double precision ).
6. Number of decimal digits that can be trusted:
up to 7 in single precision, up to 15 in double precision.
Do not trust what you see: printed values are affected by how they are being printed (format)!
Some strategies for effective computing:
Try to re-write the expression somehow...
and print big and small numbers with '%e' formating.
e.g. ax2 + bx + c = c + x*(b+a*x) ,
e.g. ax3 + bx2 + cx + d = d + x*(c+x*(b+a*x)) , etc.
A safe way to find roots of a quadratic:
q = −( b + sign(b)√(b2−4ac) ) / 2 ,
then x1 = q/a , x2 = c / q .
Fact of life: Computation (with reals) is always approximate.
Errors may come from many sources and many are unavoidable.
Estimation of algorithmic errors (from discretization and roundoff)
is crucial, the subject of Numerical Analysis.
More generally, the field of Uncertainty Quantification (UQ) studies how errors affect solutions.