Session 9: Repeated Measures and Longitudinal Analysis I

Learning objectives and outline

Learning objectives

Learning objectives:

Identify and define hierarchical and longitudinal data
Analyze correlated data using Analysis of Variance
Define and calculate Intraclass Correlation
Identify and define random and fixed effects

Textbook sections:

Vittinghoff sections 7.1 (7.2-7.3 next class)

Outline

Introduction to hierarchical and longitudinal data
Fecal Fat example
Correlations within subjects (ICC)
Random and fixed effects

Intro: hierarchical and longitudinal data

What are hierarchical and longitudinal data?

Knee radiographs are taken yearly in order to understand the onset of osteoarthritis
An indicator of heart damage is measured at 1, 3, and 6 days following a brain hemorrhage.
Groups of patients in a urinary incontinence trial are assembled from different treatment centers
Susceptibility to tuberculosis is measured in family members
A study of the choice of type of surgery to treat a brain aneurysm either by clipping the base of the aneurysm or implanting a small coil. The study is conducted by measuring the type of surgery a patient receives from a number of surgeons at a number of different institutions.

What is the distinction between hierarchical and longitudinal data?

Longitudinal data are repeated measures over time
Longitudinal data are a type of hierarchical data
- repeated measures are correlated, and nested within the observational unit (individual)
Other non-longitudinal data can also be hierarchical

Definition: Hierarchical data are data (responses or predictors) collected from or specific to different levels within a study.

Important features of this type of data

The outcomes are correlated across observations
The predictor variables can be associated with different levels of a hierarchy. e.g. we might be interested in:
- the volume of operations at the hospital,
- whether it is a for-profit or not-for-profit hospital,
- years of experience of the surgeon or where surgeons were trained,
- how the choice of surgery type depends on the age and gender of the patient.

Fecal Fat example

A Repeated Measures Example

Lack of digestive enzymes in the intestine can cause bowel absorption problems.
- This will be indicated by excess fat in the feces.
- Pancreatic enzyme supplements can alleviate the problem.
- fecfat.csv: a study of fecal fat quantity (g/day) for individuals given each of a placebo and 3 types of pills

Fecal Fat dataset

Option 1: non-hierarchical analysis (wrong)

fit1way <- lm(fecfat ~ pilltype, data=dat)

One-way analysis of variance table for fecal fat dataset
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
pilltype	3	2008.60	669.53	1.86	0.1687
Residuals	20	7193.36	359.67

Does not account for similarity of measurements within individual
Would be correct if each treatment were given to a different individual

Option 2: 2-way AOV

Accounts for individual differences in mean fecal fat
Fits a coefficient for mean fecal fat per individual
Getting closer

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Option 2: 2-way AOV

fit1way <- lm(fecfat ~ pilltype, data=dat)

One-way analysis of variance table for fecal fat dataset
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
pilltype	3	2008.60	669.53	1.86	0.1687
Residuals	20	7193.36	359.67

fit2way <- lm(fecfat ~ subject + pilltype, data=dat)

Two-way analysis of variance table. Note the similarity of the pilltype row.
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
subject	5	5588.38	1117.68	10.45	0.0002
pilltype	3	2008.60	669.53	6.26	0.0057
Residuals	15	1604.98	107.00

What happened??

1-way ANOVA correctly estimates the effect of pill type
However, 1-way ANOVA fails to accommodate the correlation within subjects
1-way ANOVA over-estimates the residual variance
- under-estimates the significance of pill type

Regression models for 1 and 2-way ANOVA

Recall for ordinary multiple linear regression: E[y|x]=β0+β1x1+β2x2+...+βpxp\begin{equation*} E[y|x] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p \end{equation*}
- $x_p$ are the predictors or independent variables
- $y$ is the outcome, response, or dependent variable
- $E[y|x]$ is the expected value of $y$ given $x$
- $\beta_p$ are the regression coefficients

Regression models for 1 and 2-way ANOVA

One-way ANOVA (person $i$ with pill type $j$ ): $\begin{equation*} \begin{aligned} FECFAT_{ij} &= \textrm{fecal fat measurement for person i with pill type j} \\ &= \mu + PILLTYPE_j + \epsilon_{ij} \end{aligned} \end{equation*}$
Two-way ANOVA: $\begin{equation*} FECFAT_{ij} = \mu + SUBJECT_i + PILLTYPE_j + \epsilon_{ij} \phantom{\hspace{3cm}} \end{equation*}$

Assumption: $\epsilon_{ij} \stackrel{iid}{\sim} N(0, \sigma_\epsilon^2)$

Correlations within subjects (ICC)

Correlations within subjects

One-way ANOVA fails because it does not account for the correlation of measurements within-person
How highly correlated are measurements on the same person? Consider subject $i$ , pill types $j$ and $k$ :

$\begin{equation*} corr(FECFAT_{ij}, FECFAT_{ik}) = \frac{cov(FECFAT_{ij}, FECFAT_{ik})} {sd(FECFAT_{ij}) sd(FECFAT_{ik})} \end{equation*}$ * This is a measure of how large the subject effect is, in relation to the error term

Correlation within subjects

$\begin{equation*} \begin{aligned} cov(FECFAT_{ij}, FECFAT_{ik}) &= cov(SUBJECT_i, SUBJECT_i) \\ &= var(SUBJECT_i) \\ &= \sigma_{subject}^2. \textrm{(definition)} \end{aligned} \end{equation*}$

Equality 1:
- $\mu$ and $pilltype$ terms are assumed to be constant, so do not enter into covariance calculation
- residuals $\epsilon$ are assumed to be independent
Equality 2:
- covariance with self is variance

Recall $SUBJECT_i$ is the term for individual in 2-way AOV. Now $\beta_i * subjectID$ , will later be treated as a random variable

Correlation within subjects

Previous slide calculated covariance for numerator of correlation. Now calculate variance for the denominator ( ${sd(FECFAT_{ij}) * sd(FECFAT_{ik})} = var(FECFAT_{ij})$ )

$\begin{equation*} \begin{aligned} var(FECFAT_{ij}) &= var(SUBJECT_i, SUBJECT_i) + var(\epsilon_{ij}) \\ &= \sigma_{subject}^2 + \sigma_{\epsilon}^2. \textrm{(definition)} \end{aligned} \end{equation*}$

Difference is that the independent residuals do contribute to $var(FECFAT_{ij})$
Variance is broken into componenets due to subject and residual variance

Intraclass Correlation

The correlation between two treatments $j$ and $k$ across subjects $i$ is: $\begin{equation*} \begin{aligned} corr(FECFAT_{ij}, FECFAT_{ik}) & = \frac{cov(FECFAT_{ij}, FECFAT_{ik})} {sd(FECFAT_{ij}) sd(FECFAT_{ik})} \\ & = \frac{\sigma_{subj}^2}{\sigma_{subj}^2 + \sigma_{\epsilon}^2} \\ ICC & = \frac{\tau_{00}^2}{\tau_{00}^2 + \sigma_\epsilon^2} \end{aligned} \end{equation*}$

Intuition behind correlations within subjects

Fecal Fat dataset

Variance of the subject averages (279.4) is increased by correlation of measurements within individual.

Calculation of correlations within subjects (ICC)

What is your estimate of the variability due to subjects, from the 2-way ANOVA?

sum(residuals(fit2way)^2) / 15 / 4 #df=15, divided by 4 pilltypes

## [1] 26.74972

279.419 - 26.75 #var(SUBJECT_i)

## [1] 252.669

Residual variance is:

sum(residuals(fit2way)^2) / 15 #df=15

## [1] 106.9989

Calculation of correlations within subjects (ICC)

Finally calculate ICC:

$\begin{equation*} \begin{aligned} ICC &= \frac{\sigma_{subj}^2}{\sigma_{subj}^2 + \sigma_{\epsilon}^2} \\ &= \frac{253}{253 + 107} &= 0.70 \end{aligned} \end{equation*}$

This calculation will become easier when we learn to estimate random coefficients in directly in the regression model.

Random and fixed effects

The next step: a mixed effects model

Two-way ANOVA is a fixed effects model: $FECFAT_{ij} = \beta_0 + \beta_{subject i} SUBJECT_i + \beta_{pilltype j} PILLTYPE_j + \epsilon_{ij}$
- Assumption: $\epsilon_i \stackrel{iid}{\sim} N(0, \sigma_\epsilon^2)$
Instead of fitting a $\beta_{subject i}$ to each individual, assume that subject effects are selected from a distribution of possible subject effects: $FECFAT_{ij} = \mu + SUBJECT_i + \beta_{pilltype j} PILLTYPE_j + \epsilon_{ij}$ where $SUBJECT_i \stackrel{iid}{\sim} N(0, \sigma_{subj}^2)$
Here subject is a random effect, and pill type is a fixed effect.
This is also a random intercept model

Random and fixed effects

Random and Fixed Effects

Summary: correlations within subjects

Subject-to-subject variability simultaneously raises or lowers all the observations on a subject
- induces correlation of within-subject measurements
Variability of individual measurements can be separated into that due to subjects and that left to residual variance.
- $var(FECFAT_{ij}) = \sigma_{subj}^2 + \sigma_{\epsilon}^2$
2-way ANOVA does not directly estimate variability due to subjects
- variance of coefficients for individual is not too far off

Summary: hierarchical data

Estimates of coefficients (or “effect sizes”) are unchanged by hierarchical modeling
Ignoring within-subject correlations results in incorrect estimates of variance, F statistics, p-values
- not always “conservative”
Intraclass Correlation (ICC) provides a measure of correlation induced by grouping
Should be able to recognize fixed and random effects

Levi Waldron

Learning objectives and outline

Learning objectives

Outline

Intro: hierarchical and longitudinal data

What are hierarchical and longitudinal data?

What is the distinction between hierarchical and longitudinal data?

Important features of this type of data

Fecal Fat example

A Repeated Measures Example

Option 1: non-hierarchical analysis (wrong)

Option 1: non-hierarchical analysis (wrong)

Option 2: 2-way AOV

Option 2: 2-way AOV

What happened??

Regression models for 1 and 2-way ANOVA

Regression models for 1 and 2-way ANOVA

Correlations within subjects (ICC)

Correlations within subjects

Correlation within subjects

Correlation within subjects

Intraclass Correlation

Intuition behind correlations within subjects

Calculation of correlations within subjects (ICC)

Calculation of correlations within subjects (ICC)

Random and fixed effects

The next step: a mixed effects model

Random and fixed effects

Summary: correlations within subjects

Summary: hierarchical data