Machine Learning for Epidemiologic and Health Policy Analysis

Adapting machine learning to Norwegian health data to enhance causal analyses of patient pathways and patients’outcome. 

cartoon man and woman in front of computer, machine learning illustration
Photo: Colourbox illustrasjonsfoto

Instructors

Alan Hubbard (UC Berkeley), Rachael Phillips (UC Berkeley), Terje P. Hagen (UiO), Jon Helgheim Holte (UiO) 

Course Description

Course focuses on advances in machine learning and its application to causal inference and prediction via Targeted Learning, which allows the use of machine learning algorithms for prediction and estimating so-called causal parameters, such as average treatment effects, optimal treatment regimes, etc. We focus on applications in estimating causal impacts of hypothetical health care interventions inspired by the Norwegian health data registries.

This course is targeted toward researchers and students. PhD-students will be given priority if the number of applicants exceeds the maximum number of places on the course (25 people). The course is free of charge.

Register here by April 15 

Course Materials

M. van der Laan and S. Rose. Targeted learning: causal inference for observational and experimental data. Springer, 2011.

Pre-requisites

Introductory course in statistics as well as courses or working knowledge of basic regressions (linear, logistic, etc.). Having some background in the programming language R is preferred but not a pre-requisite.

Course Goals

  • A basic understanding of causal inference, including structural causal models, definition of causal parameters via counterfactual distributions, and ways to establish identifiability from observed data.
  • Familiarity and ability to implement machine learning, specifically the concepts of SuperLearning and the power of cross-validation in data-adaptive estimation.
  • Ability to apply machine learning algorithms to prediction problems and estimate and derive inference for the resulting fit.
  • Ability to use the fits of machine learning algorithms to estimate causal effects using simple substitution estimators.
  • Ability to apply Targeted Learning approaches (e.g., targeted maximum likelihood estimation) to estimate, using machine learning, a priori specified treatment effects as well as general variable importance measures.
  • A basic understanding of how to define and estimate the impacts of time-dependent interventions

Schedule (subject to adjustments):

Monday June 24: Intro and Causal Inference

1015- 1400: Lectures

  • Motivation for Course: Machines for targeted inferences
  • History of parametric models
  • Introduction of machine learning - allows removing the art from data analysis
  • The incompatibility of using standard techniques with estimating rigorously without arbitrary assumptions
  • Going for statistical machines - given the data, causal model, parameter of interest, will automatically derive optimal estimate and robust inference (the future is now).
  • Roadmap for Targeted Machine Learning
    • In order to make machines, we need a rigorous roadmap so that real knowledge is used optimally, and arbitrary assumptions are kept to minimum.

    • Introduce a roadmap of estimation/inference that can be applied very generally.

1415-1600: Exercises

Tuesday June 25: Intro and Causal Inference, continued

0915-1300: Lectures

  • Causal model/ structural equation model
  • Intervention on SEM - counterfactuals
  • Identifiability
  • Examples of parameters

1315-1500: Exercises

Wednesday June 26: SuperLearning/Machine Learning

0915-1300: Lectures

  • Loss/Risk
  • Consistent estimation of risk - Cross-validation
  • Oracle Inequality
  • Ensemble Learning
  • SuperLearning

1315-1500: Exercises

 

Thursday June 27: Substitution Estimators

0915-1300: Lectures

  • General concept of substitution estimator (SubEst).
  • Correspondence of coefficient and SubEst in special case of linear model
  • Saturated models
  • Use of machine learning for data-adaptively estimating model for SubEst’s.
  • Example of average treatment effect
  • Variable importance ideas

Augmented SubEst (TMLE) and introduction to estimation of impacts of time-dependent interventions

  • Average Treatment Effect (ATE)
  • Handling missing data
  • Other parameters such as handling interventions of more than 2 levels
  • Brief discussion of longitudinal causal inference
  • LTMLE for estimating the impacts of interventions defined over several time periods

1315-1500: Exercises

Friday June 28: Implementation of machine learning models

0915-1500: Exercises

Contact:

Jon Helgheim Holte (UiO)

 

 

Published Feb. 21, 2024 1:00 PM - Last modified Feb. 22, 2024 10:45 AM