December, 2021 - François HU
Master of Science - EPITA
This lecture is available here: https://curiousml.github.io/
These exercices are considered "normal-hard" exercices.
Create a function insert, that produces the following results:
>>> l = [0, 9, 3, 10]
>>> print(insert(l, -1, 1))
[0, -1, 9, 3, 10]
>>> print(insert(l, -1, 2))
[0, 9, -1, 3, 10]
>>> print(insert(l, "epita", 4))
[0, 9, 3, 10, 'epita']
>>> print(insert(l, "epita", 10))
[0, 9, 3, 10, 'epita']
>>> print(l)
[0, 9, 3, 10]
def insert(l, v, i):
l = l[:i] + [v] + l[i:]
return(l)
l = [0, 9, 3, 10]
print(insert(l, -1, 1))
print(insert(l, -1, 2))
print(insert(l, "epita", 4))
print(insert(l, "epita", 10))
print(l)
[0, -1, 9, 3, 10] [0, 9, -1, 3, 10] [0, 9, 3, 10, 'epita'] [0, 9, 3, 10, 'epita'] [0, 9, 3, 10]
we have the following list of dictionaries
paris = [{"District": 3, "Date":2007, "Pop":34576},
{"District": 5, "Date":2007, "Pop":62664},
{"District": 6, "Date":2007, "Pop":45332},
{"District": 7, "Date":2007, "Pop":57410},
{"District": 8, "Date":2007, "Pop":39165},
{"District": 9, "Date":2007, "Pop":58632},
{"District": 10, "Date":2007, "Pop":93373},
{"District": 11, "Date":2007, "Pop":151421},
{"District": 12, "Date":2007, "Pop":142425},
{"District": 13, "Date":2007, "Pop":179213},
{"District": 14, "Date":2007, "Pop":134382},
{"District": 15, "Date":2007, "Pop":232247},
{"District": 16, "Date":2007, "Pop":159706},
{"District": 17, "Date":2007, "Pop":164673},
{"District": 18, "Date":2007, "Pop":191523},
{"District": 19, "Date":2007, "Pop":184038},
{"District": 20, "Date":2007, "Pop":194018}]
Rounded_pop which discard the last 3 digits. For example we transform the value 34576 to 34for p in paris:
p["Rounded_pop"] = p["Pop"]//1000
paris
[{'District': 3, 'Date': 2007, 'Pop': 34576, 'Rounded_pop': 34},
{'District': 5, 'Date': 2007, 'Pop': 62664, 'Rounded_pop': 62},
{'District': 6, 'Date': 2007, 'Pop': 45332, 'Rounded_pop': 45},
{'District': 7, 'Date': 2007, 'Pop': 57410, 'Rounded_pop': 57},
{'District': 8, 'Date': 2007, 'Pop': 39165, 'Rounded_pop': 39},
{'District': 9, 'Date': 2007, 'Pop': 58632, 'Rounded_pop': 58},
{'District': 10, 'Date': 2007, 'Pop': 93373, 'Rounded_pop': 93},
{'District': 11, 'Date': 2007, 'Pop': 151421, 'Rounded_pop': 151},
{'District': 12, 'Date': 2007, 'Pop': 142425, 'Rounded_pop': 142},
{'District': 13, 'Date': 2007, 'Pop': 179213, 'Rounded_pop': 179},
{'District': 14, 'Date': 2007, 'Pop': 134382, 'Rounded_pop': 134},
{'District': 15, 'Date': 2007, 'Pop': 232247, 'Rounded_pop': 232},
{'District': 16, 'Date': 2007, 'Pop': 159706, 'Rounded_pop': 159},
{'District': 17, 'Date': 2007, 'Pop': 164673, 'Rounded_pop': 164},
{'District': 18, 'Date': 2007, 'Pop': 191523, 'Rounded_pop': 191},
{'District': 19, 'Date': 2007, 'Pop': 184038, 'Rounded_pop': 184},
{'District': 20, 'Date': 2007, 'Pop': 194018, 'Rounded_pop': 194}]
Rounded_pop into a list named populationspopulations = []
for p in paris:
populations.append(p["Rounded_pop"])
populations
[34, 62, 45, 57, 39, 58, 93, 151, 142, 179, 134, 232, 159, 164, 191, 184, 194]
populationspopulations.sort()
populations
[34, 39, 45, 57, 58, 62, 93, 134, 142, 151, 159, 164, 179, 184, 191, 194, 232]
we have the following list of strings
list_of_strings = [
'EPITA', 'propose', 'trois', 'programmes',
'spécialisés', 'Master', 'of', 'Science',
'dans', 'un', 'environnement', 'international.',
'Cette', 'formation', 'professionnalisante', 'de',
'18', 'mois', 'intégralement', 'en', 'anglais',
'pour','les','étudiants','français', 'et', 'internationaux,',
'permet', 'de', 'se', 'spécialiser', 'dans', 'un',
'domaine', 'après', '3', 'ou', '4', 'ans', 'd’études.']
Write a script that create a variable named text which concatenante the items of list_of_strings with exactly one space between words.
text = ' '.join(list_of_strings)
text
'EPITA propose trois programmes spécialisés Master of Science dans un environnement international. Cette formation professionnalisante de 18 mois intégralement en anglais pour les étudiants français et internationaux, permet de se spécialiser dans un domaine après 3 ou 4 ans d’études.'
# alternatively
text = list_of_strings[0]
for s in list_of_strings[1:]:
text += ' '
text += s
text
'EPITA propose trois programmes spécialisés Master of Science dans un environnement international. Cette formation professionnalisante de 18 mois intégralement en anglais pour les étudiants français et internationaux, permet de se spécialiser dans un domaine après 3 ou 4 ans d’études.'
arr1:[[ 0 25 100 225 400]
[ 1 36 121 256 441]
[ 4 49 144 289 484]
[ 9 64 169 324 529]
[ 16 81 196 361 576]]
import numpy as np
arr1 = np.arange(25).reshape(5,-1).T**2
arr1
array([[ 0, 25, 100, 225, 400],
[ 1, 36, 121, 256, 441],
[ 4, 49, 144, 289, 484],
[ 9, 64, 169, 324, 529],
[ 16, 81, 196, 361, 576]])
arr1 (do not generate it) and call it diag1[ 0, 36, 144, 324, 576]
diag1 = arr1[np.arange(5), np.arange(5)]
diag1
array([ 0, 36, 144, 324, 576])
arr1, extract the submatrix:[[225 100 25]
[289 144 49]
[256 121 36]]
and name it subarr1.
subarr1 = arr1[[0, 2, 1], :][:, [3, 2, 1]]
print(subarr1)
[[225 100 25] [289 144 49] [256 121 36]]
arr2:[[10 9 8 7 6]
[ 9 8 7 6 5]
[ 8 7 6 5 4]
[ 7 6 5 4 3]
[ 6 5 4 3 2]]
arr = np.arange(5, 0, -1)
arr2 = np.tile(arr, (5, 1)).T + arr
print(arr2)
[[10 9 8 7 6] [ 9 8 7 6 5] [ 8 7 6 5 4] [ 7 6 5 4 3] [ 6 5 4 3 2]]
arr2 by 0 without explicit loops. You should have:
[[0 9 0 7 0]
[9 0 7 0 5]
[0 7 0 5 0]
[7 0 5 0 3]
[0 5 0 3 0]]
arr2[arr2%2==0] = 0
print(arr2)
[[0 9 0 7 0] [9 0 7 0 5] [0 7 0 5 0] [7 0 5 0 3] [0 5 0 3 0]]
orange. You should have:import numpy as np
import matplotlib.pyplot as plt
n = 2000
x = -10 * np.random.rand(n)
y = np.random.rand(n) + 1
plt.scatter(x, y, color="orange");
import numpy as np
import matplotlib.pyplot as plt
ind1 = (x<=-2) & (x>=-4) & (y<=1.8) & (y>=1.2)
ind2 = (x<=-6) & (x>=-8) & (y<=1.8) & (y>=1.2)
x_green = x[ind1 | ind2]
y_green = y[ind1 | ind2]
plt.scatter(x, y, color="orange");
plt.scatter(x_green, y_green, color="green");
Iris and defra_consumptiondatasets as dataframes and name them respectively iris and cons. You should have the following first 5 rows for each dataframe:import pandas as pd
cons = pd.read_csv('data/defra_consumption.csv', sep=';', index_col=0)
iris = pd.read_csv('data/Iris.csv', sep=',', index_col="Id")
cons.head()
| England | Wales | Scotland | N Ireland | |
|---|---|---|---|---|
| Cheese | 105 | 103 | 103 | 66 |
| Carcass meat | 245 | 227 | 242 | 267 |
| Other meat | 685 | 803 | 750 | 586 |
| Fish | 147 | 160 | 122 | 93 |
| Fats and oils | 193 | 235 | 184 | 209 |
iris.head()
| SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
|---|---|---|---|---|---|
| Id | |||||
| 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
iris (aside from the column species):Cm at the end of each column nameiris.columns = ["SepalLength", "SepalWidth", "PetalLength", "PetalWidth", "Species"]
iris.iloc[:, :-1] *= 10
iris.head()
| SepalLength | SepalWidth | PetalLength | PetalWidth | Species | |
|---|---|---|---|---|---|
| Id | |||||
| 1 | 51.0 | 35.0 | 14.0 | 2.0 | Iris-setosa |
| 2 | 49.0 | 30.0 | 14.0 | 2.0 | Iris-setosa |
| 3 | 47.0 | 32.0 | 13.0 | 2.0 | Iris-setosa |
| 4 | 46.0 | 31.0 | 15.0 | 2.0 | Iris-setosa |
| 5 | 50.0 | 36.0 | 14.0 | 2.0 | Iris-setosa |
cons, convert columns to percentages of the totals. Therefore the first row should be:cons = (cons/cons.sum(axis=0))*100
cons.head()
| England | Wales | Scotland | N Ireland | |
|---|---|---|---|---|
| Cheese | 1.315130 | 1.202288 | 1.316462 | 0.902996 |
| Carcass meat | 3.068637 | 2.649702 | 3.093047 | 3.653031 |
| Other meat | 8.579659 | 9.373176 | 9.585890 | 8.017513 |
| Fish | 1.841182 | 1.867632 | 1.559305 | 1.272404 |
| Fats and oils | 2.417335 | 2.743084 | 2.351738 | 2.859488 |