April, 2022 - François HU
Master of Science - EPITA
This lecture is available here: https://curiousml.github.io/
what happens with the following code ?
a = 1 + 2
b = 3
print(a == b)
True
a = 0.1 + 0.2
b = 0.3
print(a == b)
False
A (normalized) floating-point number $x\in \mathbb{F}$ is a number which is written in the form
$$ x = \pm x_0. x_1\cdots x_{p-1} \times b^e, \quad \quad 0 \leq x_i \leq b-1, \quad \quad x_0\neq 0 $$with
We call $x_0. x_1\cdots x_{p-1}$ the mantissa
Machine precision: $\varepsilon = b^{1-p}$
The IEEE Standard for Floating Point Arithmetic (IEEE 754) is a technical standard for floating point computing that was established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many of the problems encountered in various floating point implementations that made them difficult to use reliably and reduced their portability. IEEE Standard 754 floating point is the most common representation for real numbers on computers today.
In the IEEE standard:
the base is binary, i.e. $b = 2$
normalization is used, i.e. leading digit $x_0$ is always nonzero unless the number is zero.
The IEEE standard stores the sign, exponent, and mantissa in separate fields each of which has a fixed number of bits. The two most commonly used levels of precision for floating-point numbers are single precision and double precision.
Precision | Sign (bits) | Exponent (bits) | Mantissa (bits) |
---|---|---|---|
Single (32 bits) | 1 | 8 | 23 |
Double (64 bits) | 1 | 11 | 52 |
Approximation of $\mathbb{R}$ by $\mathbb{F}$, rounding $r : \mathbb{R}\to \mathbb{F}$
Let $x \in \mathbb{R}$ then
$$ r(x) = x\times (1+\delta ), \quad\quad |\delta|\leq \varepsilon/2 $$In the IEEE standard:
Let $x, y \in\mathbb{F}$,
$$ r(x \odot y) = (x\odot y)\times (1+\delta ), \quad\quad |\delta|\leq \varepsilon/2, \quad\quad \odot\in\left( +, -, \times, / , \sqrt{\cdot} \right) $$At each rounding, we lose a bit of accuracy, we talk about rounding error: even if an isolated operation returns the best possible result (rounding of the exact result), a series of calculations can lead to large errors due to the accumulation of rounding errors.
((((((4**0.5)**0.5)**0.5)**2)**2))**2
3.9999999999999982
import numpy as np
def f(x):
return(1 - np.cos(x))/(x**2)
print(f(1.2e-5))
print(f(1.2e-6))
print(f(1.2e-7))
print(f(1.2e-8)) # the cosine rounded to 10 significant digits is equal to 0.9999999999
print(f(1.2e-9))
print(f(1.2e-10))
0.4999997329749008 0.4999858551870931 0.5011423375044111 0.7709882115452477 0.0 0.0
An absurd convergence !
print(np.cos(1.2e-8))
print(np.cos(1.2e-9))
0.9999999999999999 1.0
Let us compute $\sum_{i=1}^{N}\frac{1}{i}$
N = 1.e+6
# in ascending order - operation "big number -> small number"
asc = 0
for i in np.arange(1, N):
asc += 1/i
asc
14.39272572286499
# in descending order - operation "small number -> big number"
des = 0
for i in np.arange(N-1, 0, -1):
des += 1/i
des
14.392725722865773
des - asc # absorption effect with "asc"
7.833733661755105e-13
Two ways to generate random numbers:
Most computer-generated random numbers use PRNGs (Pseudo Random Number Generator), which are algorithms that can automatically create long series of numbers with good random properties, but the sequence eventually repeats itself. These random numbers are suitable for many situations, but we have to keep in mind that they are pseudo-random and require a seed. The most used PRNG is the linear congruential generator, which uses the recurrence
$$ x_{i+1} = (a x_i + b)\quad \textrm{mod}\, m $$to generate numbers, where $a$, $b$ and $m$ are large integers, and $x_{{i+1}}$ is the next pseudo random number after $x_i$.
Probably the most widely known tool for generating random data in Python is its random module, which uses the Mersenne Twister (1997) PRNG algorithm as its core generator.
RANDU is a linear congruential pseudo-random number generator (LCG) which was used primarily in the 1960s and 1970s. The congruent generator RANDU, defined by the recursion (on the set of integers $\{1, \cdots, 2^{31}-1\}$)
$$ x_i = 65539\times x_{i-1} \quad \text{mod}\ 2^{31} $$and $u_i = x_i/2^{31} \in (0, 1)$ has gained a reputation as the worst generator ever used in practice. The aim of this sequence of exercices is to check that this reputation is not usurped.
Show that the triplets $(u_i, u_{i-1}, u_{i-2})$ are necessarily found on 15 hyperplanes in $[0, 1]^3$. Hint: $65539 = 2^{16}+3$.
Plot the triplets $(u_i, u_{i-1}, u_{i-2})$ in 3 dimensions for illustrating the exercice 1.
Another shortcoming of RANDU is the low period of the least significant bits of xi. Highlight this phenomenon.
Numerically verify that the default generator in python has better properties than RANDU. (Think of different ways to test a uniform generator).