Machine Learning for Epidemiologic and Health Policy Analysis

Adapting machine learning to Norwegian health data to enhance causal analyses of patient pathways and patients’outcome.

Time and place: June 24, 2024 10:15 AM – June 28, 2024 3:00 PM, University of Oslo, Department of Health Management and Economics, Harald Schjelderups hus, Forskningsveien 3a, Oslo. Seminar rooms 1 and 3.

cartoon man and woman in front of computer, machine learning illustration

Photo: Colourbox illustrasjonsfoto

Instructors

Alan Hubbard (UC Berkeley), Rachael Phillips (UC Berkeley), Terje P. Hagen (UiO), Jon Helgheim Holte (UiO)

Course Description

Course focuses on advances in machine learning and its application to causal inference and prediction via Targeted Learning, which allows the use of machine learning algorithms for prediction and estimating so-called causal parameters, such as average treatment effects, optimal treatment regimes, etc. We focus on applications in estimating causal impacts of hypothetical health care interventions inspired by the Norwegian health data registries.

This course is targeted toward researchers and students. PhD-students will be given priority if the number of applicants exceeds the maximum number of places on the course (25 people). The course is free of charge.

Course Materials

M. van der Laan and S. Rose. Targeted learning: causal inference for observational and experimental data. Springer, 2011.

Pre-requisites

Introductory course in statistics as well as courses or working knowledge of basic regressions (linear, logistic, etc.). Having some background in the programming language R is preferred but not a pre-requisite.

Course Goals

A basic understanding of causal inference, including structural causal models, definition of causal parameters via counterfactual distributions, and ways to establish identifiability from observed data.
Familiarity and ability to implement machine learning, specifically the concepts of SuperLearning and the power of cross-validation in data-adaptive estimation.
Ability to apply machine learning algorithms to prediction problems and estimate and derive inference for the resulting fit.
Ability to use the fits of machine learning algorithms to estimate causal effects using simple substitution estimators.
Ability to apply Targeted Learning approaches (e.g., targeted maximum likelihood estimation) to estimate, using machine learning, a priori specified treatment effects as well as general variable importance measures.
A basic understanding of how to define and estimate the impacts of time-dependent interventions

Schedule (subject to adjustments):

Monday June 24: Intro and Causal Inference

1015- 1400: Lectures

Motivation for Course: Machines for targeted inferences
History of parametric models
Introduction of machine learning - allows removing the art from data analysis
The incompatibility of using standard techniques with estimating rigorously without arbitrary assumptions
Going for statistical machines - given the data, causal model, parameter of interest, will automatically derive optimal estimate and robust inference (the future is now).
Roadmap for Targeted Machine Learning
- In order to make machines, we need a rigorous roadmap so that real knowledge is used optimally, and arbitrary assumptions are kept to minimum.
- Introduce a roadmap of estimation/inference that can be applied very generally.

1415-1600: Exercises

Tuesday June 25: Intro and Causal Inference, continued

0915-1300: Lectures

Causal model/ structural equation model
Intervention on SEM - counterfactuals
Identifiability
Examples of parameters

1315-1500: Exercises

Wednesday June 26: SuperLearning/Machine Learning

0915-1300: Lectures

Loss/Risk
Consistent estimation of risk - Cross-validation
Oracle Inequality
Ensemble Learning
SuperLearning

1315-1500: Exercises

Thursday June 27: Substitution Estimators

0915-1300: Lectures

General concept of substitution estimator (SubEst).
Correspondence of coefficient and SubEst in special case of linear model
Saturated models
Use of machine learning for data-adaptively estimating model for SubEst’s.
Example of average treatment effect
Variable importance ideas

Augmented SubEst (TMLE) and introduction to estimation of impacts of time-dependent interventions

Average Treatment Effect (ATE)
Handling missing data
Other parameters such as handling interventions of more than 2 levels
Brief discussion of longitudinal causal inference
LTMLE for estimating the impacts of interventions defined over several time periods

1315-1500: Exercises

Friday June 28: Implementation of machine learning models

0915-1500: Exercises

Contact:

Jon Helgheim Holte (UiO)

Organizer

Department of Health Management and Economics

Published Feb. 21, 2024 1:00 PM - Last modified Feb. 22, 2024 10:45 AM