vignettes/session_lecture.Rmd
session_lecture.Rmd
Learning objectives:
Textbook sections:
Definition: Hierarchical data are data (responses or predictors) collected from or specific to different levels within a study.
fit1way <- lm(fecfat ~ pilltype, data=dat)
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
pilltype | 3 | 2008.60 | 669.53 | 1.86 | 0.1687 |
Residuals | 20 | 7193.36 | 359.67 |
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
fit1way <- lm(fecfat ~ pilltype, data=dat)
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
pilltype | 3 | 2008.60 | 669.53 | 1.86 | 0.1687 |
Residuals | 20 | 7193.36 | 359.67 |
fit2way <- lm(fecfat ~ subject + pilltype, data=dat)
One-way ANOVA (person \(i\) with pill type \(j\)): \[\begin{equation*} \begin{aligned} FECFAT_{ij} &= \textrm{fecal fat measurement for person i with pill type j} \\ &= \mu + PILLTYPE_j + \epsilon_{ij} \end{aligned} \end{equation*}\]
Two-way ANOVA: \[\begin{equation*} FECFAT_{ij} = \mu + SUBJECT_i + PILLTYPE_j + \epsilon_{ij} \phantom{\hspace{3cm}} \end{equation*}\]
Assumption: \(\epsilon_{ij} \stackrel{iid}{\sim} N(0, \sigma_\epsilon^2)\)
\[\begin{equation*} corr(FECFAT_{ij}, FECFAT_{ik}) = \frac{cov(FECFAT_{ij}, FECFAT_{ik})} {sd(FECFAT_{ij}) sd(FECFAT_{ik})} \end{equation*}\] * This is a measure of how large the subject effect is, in relation to the error term
\[\begin{equation*} \begin{aligned} cov(FECFAT_{ij}, FECFAT_{ik}) &= cov(SUBJECT_i, SUBJECT_i) \\ &= var(SUBJECT_i) \\ &= \sigma_{subject}^2. \textrm{(definition)} \end{aligned} \end{equation*}\]
Recall \(SUBJECT_i\) is the term for individual in 2-way AOV. Now \(\beta_i * subjectID\), will later be treated as a random variable
Previous slide calculated covariance for numerator of correlation. Now calculate variance for the denominator (\({sd(FECFAT_{ij}) * sd(FECFAT_{ik})} = var(FECFAT_{ij})\))
\[\begin{equation*} \begin{aligned} var(FECFAT_{ij}) &= var(SUBJECT_i, SUBJECT_i) + var(\epsilon_{ij}) \\ &= \sigma_{subject}^2 + \sigma_{\epsilon}^2. \textrm{(definition)} \end{aligned} \end{equation*}\]
The correlation between two treatments \(j\) and \(k\) across subjects \(i\) is: \[\begin{equation*} \begin{aligned} corr(FECFAT_{ij}, FECFAT_{ik}) & = \frac{cov(FECFAT_{ij}, FECFAT_{ik})} {sd(FECFAT_{ij}) sd(FECFAT_{ik})} \\ & = \frac{\sigma_{subj}^2}{\sigma_{subj}^2 + \sigma_{\epsilon}^2} \\ ICC & = \frac{\tau_{00}^2}{\tau_{00}^2 + \sigma_\epsilon^2} \end{aligned} \end{equation*}\]
Variance of the subject averages (279.4) is increased by correlation of measurements within individual.
What is your estimate of the variability due to subjects, from the 2-way ANOVA?
## [1] 26.74972
279.419 - 26.75 #var(SUBJECT_i)
## [1] 252.669
Residual variance is:
## [1] 106.9989
Finally calculate ICC:
\[\begin{equation*} \begin{aligned} ICC &= \frac{\sigma_{subj}^2}{\sigma_{subj}^2 + \sigma_{\epsilon}^2} \\ &= \frac{253}{253 + 107} &= 0.70 \end{aligned} \end{equation*}\]
This calculation will become easier when we learn to estimate random coefficients in directly in the regression model.
Two-way ANOVA is a fixed effects model: \[ FECFAT_{ij} = \beta_0 + \beta_{subject i} SUBJECT_i + \beta_{pilltype j} PILLTYPE_j + \epsilon_{ij} \]
Instead of fitting a \(\beta_{subject i}\) to each individual, assume that subject effects are selected from a distribution of possible subject effects: \[ FECFAT_{ij} = \mu + SUBJECT_i + \beta_{pilltype j} PILLTYPE_j + \epsilon_{ij} \] where \(SUBJECT_i \stackrel{iid}{\sim} N(0, \sigma_{subj}^2)\)
Here subject is a random effect, and pill type is a fixed effect.
This is also a random intercept model