I've been putting together a study that defines pairs of observations from a long-running cohort study, such that someone who participated across three phases would contribute two pairs of observations (e.g. 1-2, 2-3). Data are in long format, with variables pertaining to each pair of observations represented by a single row.
I have defined a categorical exposure variable with categories representing shifts in exposure across each pairs of observations (e.g. stable no smoking; no smoking to moderate smoking; no smoking to heavy smoking, etc).
The outcome is a continuous variable defined at the second of each pair of observations (e.g. forced expiratory volume (fev2)).
I have set about quantifying differences in FEV according to different shifts in exposure, relative to an exposure category of interest (e.g. stable no smoking). The models are adjusted for various covariates, including FEV reported at the first of each pair of observations (fev1).
In quantifying these differences, I have adopted a fixed effects approach to look specifically at changes within individuals, thereby avoid the potential problem of differences in FEV between individuals being a consequence of time-invariant factors for which I'm unable to adjust, such as environmental factors. These models take the following form:
It has since been suggested that should be using linear regression with clustered errors:
Is anyone with a statistical or mathematical background able to explain (i) the difference in these two approaches and (ii) which approach they would personally apply?
Many thanks in advance.
I have defined a categorical exposure variable with categories representing shifts in exposure across each pairs of observations (e.g. stable no smoking; no smoking to moderate smoking; no smoking to heavy smoking, etc).
The outcome is a continuous variable defined at the second of each pair of observations (e.g. forced expiratory volume (fev2)).
I have set about quantifying differences in FEV according to different shifts in exposure, relative to an exposure category of interest (e.g. stable no smoking). The models are adjusted for various covariates, including FEV reported at the first of each pair of observations (fev1).
In quantifying these differences, I have adopted a fixed effects approach to look specifically at changes within individuals, thereby avoid the potential problem of differences in FEV between individuals being a consequence of time-invariant factors for which I'm unable to adjust, such as environmental factors. These models take the following form:
Code:
xtreg fev2 fev1 i.exposure i.covariates, vce(robust) fe
Code:
reg fev2 fev1 i.exposure i.covariates, vce(cluster n_eid)
Many thanks in advance.

Comment