Stata is a statistical software for data analyses and an alternative to packages like SPSS, R or SAS. The buzzwords: "Obtain and manage data. Explore. Visualize. Model. Make inferences.”
The course is open to everyone and participants can attend those parts of the course that are of most interest. The upside of this is that there is no fee, no attendance sheet and no exam! The downside is that the course will not give any credits in the Ph.D. program.
Time and place
Every Tuesday at 12:30 PM–3:30 PM on Zoom from 25. January to 22. March (except 8.March). The courses will be held on Zoom. No registration needed. Zoom link for all sessions.
You can find course material (presentations, syntax and data) at the end of this page.
Teachers: Hein Stigum, Jonathan Wörn.
The course will have lectures in 3 levels:
- Beginner: No previous experience in Stata.
- Elementary: General knowledge of using Stata (as given by the two beginners’ courses)
- Advanced: Experience in Stata use (as given by the elementary courses)
In addition, some experience in data handling and statistical analysis will make understanding easier. We are targeting Ph.D. candidates, Post Doctors and Researchers in Medical Statistics and Epidemiology in general.
Description of topics
Introduction to Stata
Stata can be used from the menus or from syntax. Menus are good for beginners but somewhat slow to use. The Stata syntax is systematic and short. For an experienced user it is faster to write syntax. A syntax file is a precise description of an analysis and is crucial when you need to repeat the analysis. In this introductory class, you will be acquainted with the Stata software and learn how to use syntax.
Graphics
Stata has great graphics (plots). You can visualize a large range of data and results. Plots look good “out of the box”, but every aspect can be altered and fine tuned to publication ready standards. This class will teach you about different plot types and how to adjust their look to your preferences.
Linear Regression
We use regressions for predictions or for estimating effects adjusted for confounders (or selection variables). Linear regression is used for continuous outcome data (weight, blood pressure, …). It is the easiest regression method to understand, and the techniques learned here can be used for other types of regressions. We handle non-linear dose response, interactions, non-constant error variance, the influence of outliers and predictions.
Logistic regression
Logistic regression is used for binary outcome data (disease yes/no, …). We handle non-linear dose response, interactions, the influence of outliers and predictions.
Survival analysis
Survival analysis is used for time to event outcomes (time to disease, time to death, …). The standard method is the Cox-model. We focus on the alternative Flexible Parametric Survival Models. These models estimate the same as the Cox under standard conditions, but allow easier handling of non-proportional hazards (time dependent hazard ratios). The models also have a wider range of prediction types including hazard differences and restricted mean survival times. The flexible models are a part of a whole ecosystem of programs for competing risk, multi-state models and much more.
Automating analysis
When we prepare analyses for a publication, we often redo the same analysis many times over with only small variations in data or methods. Writing a syntax that automatically prepares finished tables of figures both saves time and reduces errors. We will look at methods to achieve such automated analyses.
Programming
Simulated data is useful for learning new methods and for examining the effect of violating assumptions. We will look at simulating data for linear, logistic and survival models. By writing syntax into a program, we can also make use of Stata tools for simulation, bootstrapping and power calculations.
Individual Fixed Effects Regression
Causal interpretation of observational data is often challenged by confounding and reverse causality issues. By following individuals over time and comparing their outcomes before and after they experienced a change in the potential predictor, a more credible causal interpretation of results is often possible. Another advantage of this within-person comparison is that time-constant characteristics of the person are ruled out as confounders. Individual fixed effects models are an elegant way of implementing the analytical approach described above. The model can be applied to different sorts of clustered data, including longitudinal data of individuals. In other contexts, the model can be used to account for both observed and unobserved confounders at the family or school level, for example by comparing different children from the same family (sibling fixed effects) or from the same school. This session will provide an introduction to the individual fixed effects model and includes practical examples of how to implement the model using Stata.
Teaching/Sessions
The course will have 3 hours of lectures (12:30-15:30) for each theme. We will give syntaxes at the end of each lesson. Participants are encouraged examine these using the example data, or better, in their own data.
Date |
Level | Theme /Session Link | Teacher | Venue |
---|---|---|---|---|
25.jan | Beginner |
Introduction to STATA: Interface, file types, data handeling, basic commands |
Jonathan Wörn | Zoom |
1.feb | Beginner | Graphics: Making plots for data and results | Jonathan Wörn | Zoom |
8.feb | Elementary | Linear Regression: Standard model, non-linear effects, interactions, effects of outliers, predictions | Hein Stigum |
Zoom |
15.feb | Elementary | Logistic regression: Standard model, non-linear effects, interaction, effects of outliers, predictions |
Hein Stigum |
Zoom |
22.feb | Advanced | Survival analysis: Flexible Parametric Survival Models | Hein Stigum | Zoom |
1.march | Advanced | Automating analysis: Returned results, macros, matrices, loops | Hein Stigum | Zoom |
15.march | Advanced | Programing: Simulating data for Linear, Logistic and Survival data: Writing programs for Simulation, Bootstrapping and Power Calculations |
Hein Stigum | |
22.march | Advanced | Individual fixed effects regression: Examining within-unit changes, controlling for unit-specific characteristics. Setting up data; model specification and interpretaton; graphing results. | Jonathan Wörn | Zoom |
Organizers
Department of Community Medicine and Global Health.
Contact: Hein Stigum
Presentations
- 1 Stata Introduction (PowerPoint)
- 1 Stata Introduction (syntax)
- 1-demo-file-web.do
- Birth1 (Datafil)
- 2 Graphics (PowerPoint)
- 2 Graphics (Syntax)
- 2-demo-file.do
- 2 Graphics JW-talk.mp4
- Gestational Age and Birth weight (Datafile)
- 3 Linear Regression (PowerPoint)
- 3 Linear Regression (syntax)
- 3 Linear Regression HS-talk
- 4 Logistic Regression (syntax)
- 4 Logistic Regression (PowerPoint)
- 4 Logistic Regression HS talk
- 5 Flexible Parametric Survival Models (PowerPoint)
- 5.0 Program Downloads (syntax)
- 5.0 Cox Model (syntax)
- 5.1 Flexible Parametric Survival Models (syntax)
- 5 Flexible Parametric Survival, HS talk
- Melanoma (data), Kidney (data), Breast Cancer (data)
- 6 Macro, Matrix and Loop, Agenda (PowerPoint)
- 6 Macro, Matrix and Loop, Examples (syntax)
- 6 Macro, Matrix and Loop, Commands (syntax)
- 6 Macro, Matrix and Loop, HS talk
- 6 Gestational Age and Birth Weight
- Tables (Excel)
- 7 Simulating Data and Writing Programs, (PowerPoint)
- 7 Simulating Data and Writing Programs, Examples (syntax)
- 7 Simulating Data and Writing Programs, HS talk
- 7 Flexible Parametric Survival, Simulation
- 7 Sparse Data Bias (Excel)
- 7 Sparse Data Bias (PowerPoint)
- 8-fixed-effects-regression (PowerPoint)
- 8 Fixed-Effects-Regression_demo (syntax)
- 8 Fixed-Effects-Regression_example (log)
- 8 Fixed-Effects-Regression_example (syntax)
- 8 Fixed-Effects-Regression-JW-talk
- fe1.dta
- fe2.dta