About me
I am a Data Scientist (PhD) and lecturer in machine learning and computational statistics at ENSAE, EPITA and Institut des Actuaires. In a nutshell…
-
Since 2022: Postdoctoral Researcher in the Department of Mathematics and Statistics at l’Université de Montréal (UdeM) under the supervision of Manuel Morales and Arthur Charpentier.
I am working on a joint research for sustainable finance with Algora Lab (interdisciplinary laboratory of UdeM and Mila). I also work on the project “Design and implementation of an early warning system for infectious diseases” which is part of the Mathematics for Public Health initiative.
-
Apr.2019-2022: Award for the best thesis in actuarial science in France.
PhD thesis in Machine Learning and Insurance at Institut Polytechnique de Paris (CREST-ENSAE) entitled Semi-supervised learning in insurance: Fairness and Labeling was supervised by Caroline HILLAIRET and Romuald ELIE.
-
Apr.2018-2022: 4 years of industry experience in Data Science in the Datalab of Société Générale Insurance supervised by Marc JUILLARD.
Research
My topics of interest are the following :
- ML in Insurance & Finance
- ML fairness & transparency
- NLP in ESG Reporting for Sustainable Finance
- Semi-supervised learning & sampling methods
Recent papers
- 2022 PhD Thesis - Fairness and Labeling for multi-class problems -> Actuarial Award 2022
- 2021 Arxiv - Fairness guarantee in multi-class classification. Christophe Denis, Romuald Elie, Mohamed Hebiri and François Hu.
- 2021 Arxiv - An overview of active learning methods for insurance with fairness appreciation. Romuald Elie, Caroline Hillairet, François HU and Marc Juillard.
Recent talks
- 15/06/22 - PhD Defense [slides] -> Actuarial Award 2022
- 28/04/20 OICA - Efficient labeling with active learning [slides]
- 29/11/19 100% Data Science - Active learning for the detection of categories in text fields [slides]
Teaching
EPITA - École pour l’informatique et les techniques avancées (2020 - …)
Master of Science :
- Python & Algorithm Workshop (and initiation to programming) by François HU
- Workshop 1 : Integer arithmetic [Lecture] [Exercices ipynb] [Exercices html]
- Workshop 2 : Floating-point arithmetic & pseudo-random numbers [Lecture] [Notebook]
- Workshop 3 : Matrix representation and arithmetic [Lecture] [Notebook]
- Workshop 4 : Solving linear systems [Lecture] [Notebook]
- Workshop 5 : Solving nonlinear systems [Lecture] [Notebook]
- Workshop 6 : Evaluation and interpolation [Lecture] [Notebook]
- Workshop 7 : [Oral presentations]
- Python by François HU
- Why Python ?
- Installing Python
- Practical work 1 : Basics of Python [Lecture] [Notebook]
- Practical work 2 : Application [Exercices html] [Exercices ipynb] [Usual mistakes]
- Practical work 3 : Scientific Computing [Lecture] [Notebook] [Correction]
- Practical work 4 : Data Visualization [Lecture] [Notebook] [Correction]
- Practical work 5 : Data Manipulation [Lecture] [Notebook] [Correction]
- Practical work 6 : Engineering tools [Lecture]
- Additional exercices (basics of python) : [Exercices] [Notebook] [Correction]
- Additional exercices (exam-like) : [Exercices] [Notebook] [Correction]
- Datasets : [Iris] [Defra consumption]
- Exam : 2022.05.25 Exam
Old lectures(still available) :
Master of Science in Artificial Intelligence Systems :
- Numerical Algorithms (and optimization for Machine Learning) by François HU
- Lecture 1 : Calculus refresher [Lecture] [Notebook]
- Lecture 2 : Unconstrained optimization [Lecture] [Notebook]
- Lecture 3 : Constrained optimization [Lecture]
- Lecture 4 : Numerical methods in linear algebra [Lecture]
- Lecture 5 : Machine learning applications
- Lecture 6 : [Oral presentations]-> Titanic challenge
- Practical work : Linear/Logistic regression, PCA (and SVM) [Notebook]
- Bayesian Machine Learning by François HU
- Lecture 1 : Bayesian statistics [Lecture]
- Lecture 2 : Latent Variable Models and EM-algorithm [Lecture]
- Lecture 3 : Variational Inference and intro to NLP [Lecture]
- Lecture 4 : Markov Chain Monte Carlo (& Gaussian Process) [Lecture]
- Lecture 5 : [Oral presentations]-> Topic models, Bayesian optim, Uncertainty and t-SNE
- Practical work 1 : Naive Bayes Classifier [Notebook]
- Practical work 2 : GMM, Probabilistic K-means and PCA [Notebook]
- Practical work 3 : Topic Modeling with LDA [Notebook]
- Practical work 4 : Sampling posteriors with MCMC [Notebook]
- Practical work 5 : Bayesian Linear Regression [Notebook]
- Bonus points : p85-ex1 Lec1 (0.5pt); PW1 (1pt); PW2 (2pt); PW3 (0.5pt); PW4 (1pt); PW5 (1pt)
Institut des Actuaires - Formation Data Science pour l’Actuaire (2019 - …)
- November 2019 : Text Mining
- October 2020 : Text Mining
- July 2021 : Text Mining and Introduction to Active Learning
- Introduction : Preprocessing in NLP [Lecture]
- Lecture 1 : vector representations and topic models [Lecture]
- Lecture 2 : Deep Learning for NLP [Lecture]
- Lecture 3 : Introduction to Active Learning [Lecture]
- Animations :
- Practical work : Topic modeling [Notebook] [Correction]
Teaching assistant
Institut polytechnique de Paris (ENSAE, Polytechnique) (2019 - …)
- 1A - semester 1 (2019 - 2020) : Algorithme et programmation by Xavier Dupré
- 2A - semester 2 (2019 - 2021) : Simulation et Monte Carlo by Nicolas Chopin
- TD1 : Loi uniforme et Monte Carlo (ex1) [Corr in R] [Written corr]
- TD2 : Méthode de rejet et Box-Muller Amélioré (ex2 et ex3) [Corr in R]
- TD3 : Loi Géométrique, Variables de contrôle et Variables antithétiques (ex4 et ex5) [Corr in R]
- TD4 : MCMC et Importance Sampling (ex6 et ex7) [Corr in R]
- TD5 : Méthode Cross-Entropy et Quasi Monte Carlo (ex9 et ex5-falc) [1-Corr in R] et [2-Corr in R]
- TD6 : Soutenance
- 2A - semester 2 (2019 - 2020) : Theoretical foundations of Machine Learning by Vianney Perchet
- Directed work 1 : Plug-in methods and over/under-fitting [Written corr]
- Practical work 1 : Linear/polynomial regression and k-NN [Corr in Python]
- Directed work 2 : Selection and penalization of models [Written corr]
- Practical work 2 : k-NN, Perceptron and Cross-Validation [Corr in Python]
- Practical work 3 : SVM, Decistion Tree and Random Forest [Corr in Python]
- 3A - semester 1 (2020 - 2021) : Advanced Machine Learning by Vianney Perchet
- This course is about ERM, SVM, Boosting, Neural Net and Optimization
- Directed work : VC-dimension and ERM (correction soon available)
- Practical work : Python, Linear Regression and SVM [Corr in Python]
- Practical work : RKHS, optimization and neural networks [Written corr] [Neural Nets in python]
- 3A - semester 2 (2019 - 2020) : Machine Learning for finance by Romuald Elie
- Speaker in NLP