Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • inference for repeated measures unbalanced panel

    Hi,

    I have a panel dataset with two times (0 and 1). It is unbalanced with students surveyed at time 0 not surveyed at time 1 and students surveyed at time 1 but who were not surveyed at time 0.

    I just wanted to check if the time 0 sample and time 1 sample have similar characteristics. I therefore use a variable Y0 collected at time 0 and check whether Y0 is different at time 0 and time 1. Since some observation are repeated, I thought of simply running an OLS correcting for student id cluster. A minimum example using an example dataset would be:

    Code:
    sysuse bplong.dta, clear
    reg bp when, cluster(patient)
    Yet when I do that, my SE are actually smaller than when I simply do
    Code:
    sysuse bplong.dta, clear
    reg bp when
    which is not what I was expected. My intution was that when clustering my SE should be larger since clusters account for the within id correlation. I am wrong about this?

    I have another issue: My data is originally clustered at the school level hence in addition to clustering at the student id I would like to add a school cluster. Is there a way to that in stata?

    Thanks

  • #2
    Adrien:
    it may be that the autocorrelation is not that relevant and as, such, the clustered standard errors (that need at least 30 clusters to work properly) are misleading:
    Code:
    . sysuse bplong.dta
    (Fictional blood-pressure data)
    
    . reg bp when, cluster(patient)
    
    Linear regression                               Number of obs     =        240
                                                    F(1, 119)         =      11.09
                                                    Prob > F          =     0.0012
                                                    R-squared         =     0.0380
                                                    Root MSE          =      12.86
    
                                  (Std. err. adjusted for 120 clusters in patient)
    ------------------------------------------------------------------------------
                 |               Robust
              bp | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            when |  -5.091667   1.528938    -3.33   0.001    -8.119117   -2.064217
           _cons |   161.5417   2.272543    71.08   0.000     157.0418    166.0415
    ------------------------------------------------------------------------------
    
    . predict uhat, res
    
    . forva j = 1/2  {
      2. quietly corr uhat L`j'.uhat
      3. display "Autocorrelation at lag `j' 0 " %6.3f r(rho)
      4.  }
    Autocorrelation at lag 1 0  0.159
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I am not sure I understand your answer. The autocorrelation is positive so I would expect the cluster to increase the SE not reduce it. Also I calculated the intra-class correlation and I find 11.5% on this example. Here again I would have expected the SE to go up not down.

      Comment


      • #4
        Adrien:
        I was unclear and partially wrong.
        Sorry for the confusion.
        I meant that, given the low autocorrelation, clustered standard errors are possibly misleading (120 panels are clearly enough).
        Another issue may rest on the fact that each panel has a maximum of two waves of data.
        As far as your last question is concerned, you may want to take a look at -mixed- or the community-contributed command -reghdfe-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X