Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects regression versus*linear regression with clustered errors.

    I've been putting together a study that defines pairs of observations from a long-running cohort study, such that someone who participated across three phases would contribute two pairs of observations (e.g. 1-2, 2-3). Data are in long format, with variables pertaining to each pair of observations represented by a single row.

    I have defined a categorical exposure variable with categories representing shifts in exposure across each pairs of observations (e.g. stable no smoking; no smoking to moderate smoking; no smoking to heavy smoking, etc).

    The outcome is a continuous variable defined at the second of each pair of observations (e.g. forced expiratory volume (fev2)).

    I have set about quantifying differences in FEV according to different shifts in exposure, relative to an exposure category of interest (e.g. stable no smoking). The models are adjusted for various covariates, including FEV reported at the first of each pair of observations (fev1).

    In quantifying these differences, I have adopted a fixed effects approach to look specifically at changes within individuals, thereby avoid the potential problem of differences in FEV between individuals being a consequence of time-invariant factors for which I'm unable to adjust, such as environmental factors. These models take the following form:

    Code:
    xtreg fev2 fev1 i.exposure i.covariates, vce(robust) fe
    It has since been suggested that should be using linear regression with clustered errors:

    Code:
    reg fev2 fev1 i.exposure i.covariates, vce(cluster n_eid)
    Is anyone with a statistical or mathematical background able to explain (i) the difference in these two approaches and (ii) which approach they would personally apply?

    Many thanks in advance.
    Last edited by Craig Knott; 12 Jan 2018, 01:58.

  • #2
    Craig:
    some comments about your codes:
    - under -xtreg- (warning: what follows does not hold for -regress-) clustered/robustified do the very same job (that is, dealing with autocorrelation and/or heteroskedasticity of the idiosyncratic error), Is this the case with your data?;
    - -regress- does not support -fe- option. Hence, your second code, in its ciurrent form, will not work the way it is supposed to.
    Personally, I fail to get the reason why you should go -regress- with clustered standard error (which is correct, as you have repeated observations for the same -panelid-) when you can go -xtreg- (which will be my choice).
    Moreover, it is perfectly reasonable that you want to go -fe- but, should you have the need to compare -fe- vs -re- specification, you cannot do it after -regress-, whereas this postestimation test can be easily performed via -hausman- (if you have default standard errors) or the user-written programme -xtoverid- (if you have robustifierd/clustered standard errors).
    As a closing out remark, I would also investigate whether your model suffers from endogeneity: is there any time-variant omitted predictor embedded among residuals which might be correlated to both -fev1- and -fev2-?).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Morning, Carlo. Thanks for your insight on this.

      - A number of participants are providing multiple pairs of observations, so autocorrelation would indeed be a concern.
      - Are you able to elaborate any further on the ways in which the regress option would be inappropriate for the type of analysis I've been working on?
      - Endogeneity is definitely an important consideration. I've identified a number of likely time-variant confounders, but have begun with a pared down series of models in the first instance; I'll hopefully be adding to their complexity in time.

      Comment


      • #4
        Craig:
        thanks for providing further details.
        - I assume that you're dealing with a large N, short T panel dataset. In this case, autocorrelation seldom bites that hard. However, if you suspect the opposite, it's wise to robustifying/clustering your standard errors.
        - Instead of elaborating on my previous reply, I take the liberty to point you to the Example 2, -xtreg- entry, Stata .pdf manual. In brief, pooled OLS outperforms -xtreg- only when the F-test appearing at the foot of -xtreg- outcome tabkle fails to reach statistical significance (meaning that the -u_i- are not jointly different from zero).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X