About me

I am a Lead AI researcher and lecturer in machine learning and computational statistics at Cnam, ENSAE, EPITA and Institut des Actuaires. In a nutshell…

Since 2024: Head of R&D AI Lab at Milliman France. Leading the AI Lab with a focus on R&D projects in Generative AI (#GenAI) and Trustworthy AI (Fairness, Interpretability, Privacy) for insurance and finance applications within the R&D division of Alexandre Boumezoued.

2022 - 2024: Postdoctoral Researcher in the Department of Mathematics and Statistics at Université de Montréal (UdeM) and affiliated with MILA via Algora Lab with Arthur Charpentier (UQAM) and Manuel Morales (UdeM). TL;DR: My expertise lies in statistical learning, mathematics, primarily within the insurance domains, but it also extends to biostatistics and finance.

Short description: I am currently engaged in collaborative research within diverse domains, with notable expertise in the following areas:

Insurance, Statistics & Statistical learning: I specialize in applying statistical (machine and deep) learning and mathematical modeling to the insurance industry, with a particular focus on algorithmic fairness. My expertise is further demonstrated by several publications in leading journals and conferences in this field.
Biostatistics & Pandemic Modeling: Within the field of biostatistics, my focus revolves around the application of spatiotemporal and NLP techniques. Specifically, I am actively involved in the development and implementation of an ‘Early Warning System for Infectious Diseases’. This project is part of the Mathematics for Public Health (Mfph) initiative.
Finance: My active participation in collaborative research extends to the finance domain, where our efforts are concentrated on algorithmic fairness and NLP techniques. The objective is to identify Environmental, Social, and Governance (ESG) concepts within Canadian companies. This joint initiative, conducted in partnership with Algora Lab (affiliated with UdeM and Mila), strives to advance sustainable finance and instill ethical considerations within the realms of AI and ML.

2019 - 2022: Award for the best thesis in actuarial science in France.

3-years PhD thesis in Machine Learning and Insurance at Institut Polytechnique de Paris (CREST-ENSAE) titled Semi-supervised learning in insurance: Fairness and Labeling under the supervision of Caroline HILLAIRET and Romuald ELIE.

Short description: Insurance and financial institutions amass substantial volumes of unstructured data on a daily basis. However, effectively managing this extensive data presents numerous challenges within the field of machine learning and model transparency : (1) the existing manual tagging approach by experts is inefficient for handling large volumes and near-real-time information; and (2) the data may contain biased information, raising ethical concerns and making it unsuitable for use. To address these issues, there is a critical need for the implementation of a precise (in terms of prediction), cost-effective (in terms of labeling), and ethical (in terms of transparency and fairness) learning system within the insurance and finance sector. This thesis is dedicated to resolving these challenges, offering a comprehensive solution to improve data management and ensure compliance.

2018 - 2022: over 4 years of industry experience in Data Science in the Datalab of Société Générale Insurance.

Short description: I specialize in textual data analysis, emphasizing fairness and transparency in machine learning models. My responsibilities include overseeing various statistical projects, such as online learning, semi-supervised learning, transparency in deep learning models, and computer vision. I played a crucial role in deploying machine learning models for insurance scoring, covering habitation (MRH) and automotive (Auto) domains. Furthermore, I actively contributed to implementing ETL tools and remain involved in constructing a streamlined End-to-End ML orchestration pipeline, utilizing MLOps tools like Git (Github/Gitlab), MLflow, Kedro, CICD / automated tests …

Research

My topics of interest are the following :

ML, primarily in Insurance, but also in Finance
ML Fairness, Interpretability and Privacy
NLP/LLM/GenAI related topics, examples: Insights in ESG Reporting for Sustainable Finance, Stduying news and posts about infectious diseases, …
Semi-supervised learning & sampling methods

Some notable papers. For the latest papers, please refer to my Google Scholar profile.

2024 Fairness guarantee in multi-class classification
- Authors: C. Denis, R. Elie, M. Hebiri and F. Hu.
- Publication: Journal of Machine Learning Research (JMLR)
2023 Parametric Fairness with Statistical Guarantees
- Authors: F. Hu, P. Ratz and A. Charpentier
- (Pre-)publication: Arxiv
2023 A Sequentially Fair Mechanism for Multiple Sensitive Attributes
- Authors: F. Hu, P. Ratz and A. Charpentier
- Publication: AAAI-2024
- Check out EquiPy, our Python package for fair calibration: https://equilibration.github.io/equipy/equipy.fairness.html
2023 Fairness in Multi-Task Learning via Wasserstein Barycenters
- Authors: F. Hu, P. Ratz and A. Charpentier
- Publication: ECML-PKDD 2023 Research Track
2023 Fairness Explainability using Optimal Transport with Applications in Image Classification
- Authors: P. Ratz, F. Hu and A. Charpentier
- (Pre-)publication: Arxiv
2023 Mitigating Discrimination in Insurance with Wasserstein Barycenters
- Authors: A. Charpentier, F. Hu and P. Ratz
- Publication: BIAS 2023
2022 Fairness and Labeling for multi-class problems
- SCOR 2022: Awarded best actuarial thesis in France in 2022
- PhD Thesis
2021 An overview of active learning methods for insurance with fairness appreciation
- Authors: R. Elie, C. Hillairet, F. HU and M. Juillard.
- (Pre-)publication: Arxiv

Recent talks Not updated since March 2024, see CV

March 24 [Risk forum] - Algorithmic Fairness for multiple sensitive attributes, with applications in insurance
February 24 [AAAI-24] - A sequentially fair mechanism for multiple sensitive attributes
22-09-23 BIAS 2023 - Mitigating Discrimination in Insurance [slides]
21-09-23 ECML-PKDD 2023 - Fairness in Multi-Task Learning via Wasserstein Barycenters [slides]
07-02-23 Institut des actuaires - Fairness and Labeling for multi-class problems: application [slides]
15-06-22 - PhD Defense [slides] -> Actuarial Award 2022
28-04-20 OICA - Efficient labeling with active learning [slides]
29-11-19 100% Data Science - Active learning for the detection of categories in text fields [slides]

Teaching

EPITA (2020 - …)

Master of Science :

Python & Algorithm Workshop (and initiation to programming) by François HU
- Workshop 1 : Integer arithmetic [Lecture] [Exercices ipynb] [Exercices html]
- Workshop 2 : Floating-point arithmetic & pseudo-random numbers [Lecture] [Notebook]
- Workshop 3 : Matrix representation and arithmetic [Lecture] [Notebook]
- Workshop 4 : Solving linear systems [Lecture] [Notebook]
- Workshop 5 : Solving nonlinear systems [Lecture] [Notebook]
- Workshop 6 : Evaluation and interpolation [Lecture] [Notebook]
- Workshop 7 : Oral presentations
Python by François HU
- Why Python ?
- Installing Python
- Practical work 1 : Basics of Python [Lecture] [Notebook]
- Practical work 2 : Application [Exercices html] [Exercices ipynb] [Usual mistakes]
- Practical work 3 : Scientific Computing [Lecture] [Notebook] [Correction]
- Practical work 4 : Data Visualization [Lecture] [Notebook]
- Practical work 5 : Data Manipulation [Lecture] [Notebook]
- Practical work 6 : Engineering tools [Lecture]
- Additional exercices (basics of python) : [Exercices] [Notebook]
- Additional exercices (exam-like) : [Exercices] [Notebook]
- Datasets : [Iris] [Defra consumption]
- Exam : 2024.05.07 Exam (Available)
- ~~Old lectures~~ (still available):
- Case study with freMPL Dataset [Notebook], [Dataset]

Master of Science in Artificial Intelligence Systems :

Numerical Algorithms (and optimization for Machine Learning) by François HU
- Lecture 1 : Calculus refresher [Lecture] [Notebook]
- Lecture 2 : Unconstrained optimization [Lecture] [Notebook]
- Lecture 3 : Constrained optimization [Lecture]
- Lecture 4 : Numerical methods in linear algebra [Lecture]
- Lecture 5 : Machine learning applications
- Lecture 6 : [Oral presentations]-> Titanic challenge
- Practical work : Linear/Logistic regression, PCA (and SVM) [Notebook]
Bayesian Machine Learning by François HU
- Lecture 1 : Bayesian statistics [Lecture]
- Lecture 2 : Latent Variable Models and EM-algorithm [Lecture]
- Lecture 3 : Variational Inference and application to NLP [Lecture]
- Lecture 4 : Causal Inference [Lecture]
- Lecture 5 : [Oral presentations]-> Topic models, Bayesian optim, Uncertainty and t-SNE
- Practical work 1 : Naive Bayes Classifier [Notebook]
- Practical work 2 : GMM, Probabilistic K-means and PCA [Notebook]
- Practical work 3 : Topic Modeling with LDA [Notebook]
- Practical work 4 : Bayesian Linear Regression
- Bonus points : p85-ex1 Lec1 (0.5pt); PW1 (1pt); PW2 (2pt); PW3 (0.5pt)

Institut des Actuaires (2019 - …)

Formation Data Science pour l’Actuaire :

2025 : Machine learning methods for individual mortality modeling
- datasets : [labeled.zip] [unlabeled.zip]
- Mortality scores : [Notebook] [Introduction biais algorithmique]
November 2019 : Text Mining
October 2020 : Text Mining
July 2021 : Text Mining and Introduction to Active Learning
- Introduction : Preprocessing in NLP [Lecture]
- Lecture 1 : vector representations and topic models [Lecture]
- Lecture 2 : Deep Learning for NLP [Lecture]
- Lecture 3 : Introduction to Active Learning [Lecture]
- Animations :
- Practical work : Topic modeling [Notebook] [Correction]

Cnam (Conservatoire national des arts et métiers) (2024 - …)

Master of Science in Actuarial Science :

Lecturer on Fairness in Insurance (6 hours), integrated into the Contemporary Actuarial Science course.

Teaching assistant

Institut polytechnique de Paris (ENSAE, Polytechnique) (2019 - …)

1A - semester 1 (2019 - 2020) : Algorithme et programmation by Xavier Dupré
- Feuille de route 2019
2A - semester 2 (2019 - 2021) : Simulation et Monte Carlo by Nicolas Chopin
- TD1 : Loi uniforme et Monte Carlo (ex1) [Corr in R] [Written corr]
- TD2 : Méthode de rejet et Box-Muller Amélioré (ex2 et ex3) [Corr in R]
- TD3 : Loi Géométrique, Variables de contrôle et Variables antithétiques (ex4 et ex5) [Corr in R]
- TD4 : MCMC et Importance Sampling (ex6 et ex7) [Corr in R]
- TD5 : Méthode Cross-Entropy et Quasi Monte Carlo (ex9 et ex5-falc) [1-Corr-R] et [2-Corr-R]
- TD6 : Soutenance
2A - semester 2 (2019 - 2020) : Theoretical foundations of Machine Learning by Vianney Perchet
- Directed work 1 : Plug-in methods and over/under-fitting [Written corr]
- Practical work 1 : Linear/polynomial regression and k-NN [Corr in Python]
- Directed work 2 : Selection and penalization of models [Written corr]
- Practical work 2 : k-NN, Perceptron and Cross-Validation [Corr in Python]
- Practical work 3 : SVM, Decistion Tree and Random Forest [Corr in Python]
3A - semester 1 (2020 - 2021) : Advanced Machine Learning by Vianney Perchet
- This course is about ERM, SVM, Boosting, Neural Net and Optimization
- Directed work : VC-dimension and ERM (correction soon available)
- Practical work : Python, Linear Regression and SVM [Corr in Python]
- Practical work : RKHS, optimization and neural networks [Written corr] [Neural Nets in python]
3A - semester 2 (2019 - 2020) : Machine Learning for finance by Romuald Elie
- Speaker in NLP