inference for repeated measures unbalanced panel

Adrien Bouguen

Join Date: Jul 2014

Posts: 85
#1

inference for repeated measures unbalanced panel

23 Oct 2023, 19:58

Hi,

I have a panel dataset with two times (0 and 1). It is unbalanced with students surveyed at time 0 not surveyed at time 1 and students surveyed at time 1 but who were not surveyed at time 0.

I just wanted to check if the time 0 sample and time 1 sample have similar characteristics. I therefore use a variable Y0 collected at time 0 and check whether Y0 is different at time 0 and time 1. Since some observation are repeated, I thought of simply running an OLS correcting for student id cluster. A minimum example using an example dataset would be:

Code:

sysuse bplong.dta, clear reg bp when, cluster(patient)

Yet when I do that, my SE are actually smaller than when I simply do

Code:

sysuse bplong.dta, clear reg bp when

which is not what I was expected. My intution was that when clustering my SE should be larger since clusters account for the within id correlation. I am wrong about this?

I have another issue: My data is originally clustered at the school level hence in addition to clustering at the student id I would like to add a school cluster. Is there a way to that in stata?

Thanks
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

24 Oct 2023, 00:26

Adrien:
it may be that the autocorrelation is not that relevant and as, such, the clustered standard errors (that need at least 30 clusters to work properly) are misleading:

Code:

. sysuse bplong.dta
(Fictional blood-pressure data)

. reg bp when, cluster(patient)

Linear regression                               Number of obs     =        240
                                                F(1, 119)         =      11.09
                                                Prob > F          =     0.0012
                                                R-squared         =     0.0380
                                                Root MSE          =      12.86

                              (Std. err. adjusted for 120 clusters in patient)
------------------------------------------------------------------------------
             |               Robust
          bp | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        when |  -5.091667   1.528938    -3.33   0.001    -8.119117   -2.064217
       _cons |   161.5417   2.272543    71.08   0.000     157.0418    166.0415
------------------------------------------------------------------------------

. predict uhat, res

. forva j = 1/2  {
  2. quietly corr uhat L`j'.uhat
  3. display "Autocorrelation at lag `j' 0 " %6.3f r(rho)
  4.  }
Autocorrelation at lag 1 0  0.159

Kind regards,
Carlo
(Stata 19.0)

Comment

Adrien Bouguen

Join Date: Jul 2014

Posts: 85
#3

24 Oct 2023, 09:28

I am not sure I understand your answer. The autocorrelation is positive so I would expect the cluster to increase the SE not reduce it. Also I calculated the intra-class correlation and I find 11.5% on this example. Here again I would have expected the SE to go up not down.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#4

24 Oct 2023, 11:29

Adrien:
I was unclear and partially wrong.
Sorry for the confusion.
I meant that, given the low autocorrelation, clustered standard errors are possibly misleading (120 panels are clearly enough).
Another issue may rest on the fact that each panel has a maximum of two waves of data.
As far as your last question is concerned, you may want to take a look at -mixed- or the community-contributed command -reghdfe-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

inference for repeated measures unbalanced panel

Comment

Comment

Comment