December, 2021 - François HU
Master of Science - EPITA
This lecture is available here: https://curiousml.github.io/
These exercices are considered "normal-hard" exercices.
Create a function insert
, that produces the following results:
>>> l = [0, 9, 3, 10]
>>> print(insert(l, -1, 1))
[0, -1, 9, 3, 10]
>>> print(insert(l, -1, 2))
[0, 9, -1, 3, 10]
>>> print(insert(l, "epita", 4))
[0, 9, 3, 10, 'epita']
>>> print(insert(l, "epita", 10))
[0, 9, 3, 10, 'epita']
>>> print(l)
[0, 9, 3, 10]
def insert(l, v, i):
l = l[:i] + [v] + l[i:]
return(l)
l = [0, 9, 3, 10]
print(insert(l, -1, 1))
print(insert(l, -1, 2))
print(insert(l, "epita", 4))
print(insert(l, "epita", 10))
print(l)
[0, -1, 9, 3, 10] [0, 9, -1, 3, 10] [0, 9, 3, 10, 'epita'] [0, 9, 3, 10, 'epita'] [0, 9, 3, 10]
we have the following list of dictionaries
paris = [{"District": 3, "Date":2007, "Pop":34576},
{"District": 5, "Date":2007, "Pop":62664},
{"District": 6, "Date":2007, "Pop":45332},
{"District": 7, "Date":2007, "Pop":57410},
{"District": 8, "Date":2007, "Pop":39165},
{"District": 9, "Date":2007, "Pop":58632},
{"District": 10, "Date":2007, "Pop":93373},
{"District": 11, "Date":2007, "Pop":151421},
{"District": 12, "Date":2007, "Pop":142425},
{"District": 13, "Date":2007, "Pop":179213},
{"District": 14, "Date":2007, "Pop":134382},
{"District": 15, "Date":2007, "Pop":232247},
{"District": 16, "Date":2007, "Pop":159706},
{"District": 17, "Date":2007, "Pop":164673},
{"District": 18, "Date":2007, "Pop":191523},
{"District": 19, "Date":2007, "Pop":184038},
{"District": 20, "Date":2007, "Pop":194018}]
Rounded_pop
which discard the last 3 digits. For example we transform the value 34576
to 34
for p in paris:
p["Rounded_pop"] = p["Pop"]//1000
paris
[{'District': 3, 'Date': 2007, 'Pop': 34576, 'Rounded_pop': 34}, {'District': 5, 'Date': 2007, 'Pop': 62664, 'Rounded_pop': 62}, {'District': 6, 'Date': 2007, 'Pop': 45332, 'Rounded_pop': 45}, {'District': 7, 'Date': 2007, 'Pop': 57410, 'Rounded_pop': 57}, {'District': 8, 'Date': 2007, 'Pop': 39165, 'Rounded_pop': 39}, {'District': 9, 'Date': 2007, 'Pop': 58632, 'Rounded_pop': 58}, {'District': 10, 'Date': 2007, 'Pop': 93373, 'Rounded_pop': 93}, {'District': 11, 'Date': 2007, 'Pop': 151421, 'Rounded_pop': 151}, {'District': 12, 'Date': 2007, 'Pop': 142425, 'Rounded_pop': 142}, {'District': 13, 'Date': 2007, 'Pop': 179213, 'Rounded_pop': 179}, {'District': 14, 'Date': 2007, 'Pop': 134382, 'Rounded_pop': 134}, {'District': 15, 'Date': 2007, 'Pop': 232247, 'Rounded_pop': 232}, {'District': 16, 'Date': 2007, 'Pop': 159706, 'Rounded_pop': 159}, {'District': 17, 'Date': 2007, 'Pop': 164673, 'Rounded_pop': 164}, {'District': 18, 'Date': 2007, 'Pop': 191523, 'Rounded_pop': 191}, {'District': 19, 'Date': 2007, 'Pop': 184038, 'Rounded_pop': 184}, {'District': 20, 'Date': 2007, 'Pop': 194018, 'Rounded_pop': 194}]
Rounded_pop
into a list named populations
populations = []
for p in paris:
populations.append(p["Rounded_pop"])
populations
[34, 62, 45, 57, 39, 58, 93, 151, 142, 179, 134, 232, 159, 164, 191, 184, 194]
populations
populations.sort()
populations
[34, 39, 45, 57, 58, 62, 93, 134, 142, 151, 159, 164, 179, 184, 191, 194, 232]
we have the following list of strings
list_of_strings = [
'EPITA', 'propose', 'trois', 'programmes',
'spécialisés', 'Master', 'of', 'Science',
'dans', 'un', 'environnement', 'international.',
'Cette', 'formation', 'professionnalisante', 'de',
'18', 'mois', 'intégralement', 'en', 'anglais',
'pour','les','étudiants','français', 'et', 'internationaux,',
'permet', 'de', 'se', 'spécialiser', 'dans', 'un',
'domaine', 'après', '3', 'ou', '4', 'ans', 'd’études.']
Write a script that create a variable named text
which concatenante the items of list_of_strings
with exactly one space between words.
text = ' '.join(list_of_strings)
text
'EPITA propose trois programmes spécialisés Master of Science dans un environnement international. Cette formation professionnalisante de 18 mois intégralement en anglais pour les étudiants français et internationaux, permet de se spécialiser dans un domaine après 3 ou 4 ans d’études.'
# alternatively
text = list_of_strings[0]
for s in list_of_strings[1:]:
text += ' '
text += s
text
'EPITA propose trois programmes spécialisés Master of Science dans un environnement international. Cette formation professionnalisante de 18 mois intégralement en anglais pour les étudiants français et internationaux, permet de se spécialiser dans un domaine après 3 ou 4 ans d’études.'
arr1
:[[ 0 25 100 225 400]
[ 1 36 121 256 441]
[ 4 49 144 289 484]
[ 9 64 169 324 529]
[ 16 81 196 361 576]]
import numpy as np
arr1 = np.arange(25).reshape(5,-1).T**2
arr1
array([[ 0, 25, 100, 225, 400], [ 1, 36, 121, 256, 441], [ 4, 49, 144, 289, 484], [ 9, 64, 169, 324, 529], [ 16, 81, 196, 361, 576]])
arr1
(do not generate it) and call it diag1
[ 0, 36, 144, 324, 576]
diag1 = arr1[np.arange(5), np.arange(5)]
diag1
array([ 0, 36, 144, 324, 576])
arr1
, extract the submatrix:[[225 100 25]
[289 144 49]
[256 121 36]]
and name it subarr1
.
subarr1 = arr1[[0, 2, 1], :][:, [3, 2, 1]]
print(subarr1)
[[225 100 25] [289 144 49] [256 121 36]]
arr2
:[[10 9 8 7 6]
[ 9 8 7 6 5]
[ 8 7 6 5 4]
[ 7 6 5 4 3]
[ 6 5 4 3 2]]
arr = np.arange(5, 0, -1)
arr2 = np.tile(arr, (5, 1)).T + arr
print(arr2)
[[10 9 8 7 6] [ 9 8 7 6 5] [ 8 7 6 5 4] [ 7 6 5 4 3] [ 6 5 4 3 2]]
arr2
by 0
without explicit loops. You should have:
[[0 9 0 7 0]
[9 0 7 0 5]
[0 7 0 5 0]
[7 0 5 0 3]
[0 5 0 3 0]]
arr2[arr2%2==0] = 0
print(arr2)
[[0 9 0 7 0] [9 0 7 0 5] [0 7 0 5 0] [7 0 5 0 3] [0 5 0 3 0]]
orange
. You should have:import numpy as np
import matplotlib.pyplot as plt
n = 2000
x = -10 * np.random.rand(n)
y = np.random.rand(n) + 1
plt.scatter(x, y, color="orange");
import numpy as np
import matplotlib.pyplot as plt
ind1 = (x<=-2) & (x>=-4) & (y<=1.8) & (y>=1.2)
ind2 = (x<=-6) & (x>=-8) & (y<=1.8) & (y>=1.2)
x_green = x[ind1 | ind2]
y_green = y[ind1 | ind2]
plt.scatter(x, y, color="orange");
plt.scatter(x_green, y_green, color="green");
Iris
and defra_consumption
datasets as dataframes and name them respectively iris
and cons
. You should have the following first 5 rows for each dataframe:import pandas as pd
cons = pd.read_csv('data/defra_consumption.csv', sep=';', index_col=0)
iris = pd.read_csv('data/Iris.csv', sep=',', index_col="Id")
cons.head()
England | Wales | Scotland | N Ireland | |
---|---|---|---|---|
Cheese | 105 | 103 | 103 | 66 |
Carcass meat | 245 | 227 | 242 | 267 |
Other meat | 685 | 803 | 750 | 586 |
Fish | 147 | 160 | 122 | 93 |
Fats and oils | 193 | 235 | 184 | 209 |
iris.head()
SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|
Id | |||||
1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
iris
(aside from the column species
):Cm
at the end of each column nameiris.columns = ["SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "Species"]
iris.iloc[:, :-1] *= 10
iris.head()
SepalLength | SepalWidth | PetalLength | PetalWidth | Species | |
---|---|---|---|---|---|
Id | |||||
1 | 51.0 | 35.0 | 14.0 | 2.0 | Iris-setosa |
2 | 49.0 | 30.0 | 14.0 | 2.0 | Iris-setosa |
3 | 47.0 | 32.0 | 13.0 | 2.0 | Iris-setosa |
4 | 46.0 | 31.0 | 15.0 | 2.0 | Iris-setosa |
5 | 50.0 | 36.0 | 14.0 | 2.0 | Iris-setosa |
cons
, convert columns to percentages of the totals. Therefore the first row should be:cons = (cons/cons.sum(axis=0))*100
cons.head()
England | Wales | Scotland | N Ireland | |
---|---|---|---|---|
Cheese | 1.315130 | 1.202288 | 1.316462 | 0.902996 |
Carcass meat | 3.068637 | 2.649702 | 3.093047 | 3.653031 |
Other meat | 8.579659 | 9.373176 | 9.585890 | 8.017513 |
Fish | 1.841182 | 1.867632 | 1.559305 | 1.272404 |
Fats and oils | 2.417335 | 2.743084 | 2.351738 | 2.859488 |