Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unusual number of clusters

    Dear Statalist,

    I have a cross-sectional dataset with 1.809 observations and 375 variables. I am also using Stata 14.1.

    Data was collected at school-level and there is a total of 42 schools. I want to use clustered standard errors at school-level and since there are only 42 schools, the number of clusters should equal 42, correct?

    However, when if I run:
    Code:
    ivregress 2sls y (x1 = z), first vce(cluster School_name)
    The number of clusters in the first-stage is unusually high (it can go from 350 to 850, depending on the number of variables I add to the model). School_name is a numeric variable coded between 1 and 42, as follows:

    PHP Code:
        Nome da |
        
    escola: |      Freq.     Percent        Cum.
    ------------+-----------------------------------
           
    1.00 |         63        3.48        3.48
           2.00 
    |         72        3.98        7.46
           4.00 
    |         51        2.82       10.28
           5.00 
    |         45        2.49       12.77
           7.00 
    |         38        2.10       14.87
           8.00 
    |         33        1.82       16.69
           9.00 
    |         14        0.77       17.47
          10.00 
    |         73        4.04       21.50
          11.00 
    |         10        0.55       22.06
          12.00 
    |         44        2.43       24.49
          13.00 
    |         68        3.76       28.25
          14.00 
    |         30        1.66       29.91
          15.00 
    |         42        2.32       32.23
          17.00 
    |         32        1.77       34.00
          18.00 
    |         51        2.82       36.82
          19.00 
    |         75        4.15       40.96
          20.00 
    |         45        2.49       43.45
          21.00 
    |         82        4.53       47.98
          22.00 
    |         45        2.49       50.47
          24.00 
    |         26        1.44       51.91
          25.00 
    |         61        3.37       55.28
          26.00 
    |         74        4.09       59.37
          27.00 
    |         40        2.21       61.58
          28.00 
    |         77        4.26       65.84
          29.00 
    |         71        3.92       69.76
          30.00 
    |         90        4.98       74.74
          32.00 
    |         58        3.21       77.94
          33.00 
    |         83        4.59       82.53
          34.00 
    |          8        0.44       82.97
          35.00 
    |         25        1.38       84.36
          36.00 
    |         54        2.99       87.34
          38.00 
    |         65        3.59       90.93
          39.00 
    |         66        3.65       94.58
          40.00 
    |         25        1.38       95.96
          42.00 
    |         73        4.04      100.00
    ------------+-----------------------------------
          
    Total |      1,809      100.00 

    Does anyone know what could be causing this jump in the number of clusters?

    Thank you in advance.

    Best regards,

    Sara Martins

  • #2
    Your data is not at school level. It is at individual level (lowest level) nested within school (upper level). We do not know who are your individuals nested within school, could be pupils/teachers or whatsoever. Your output suggests school-1 has 63 observations that belong to pupils/teachers i.e. at individual level, 2 has 72 observations and that way you have 1809 observations from 42 clusters (schools). Note, your number of cluster did not increase, it remained 42 as you mentioned. The frequency represents the number of observations within various schools being used for the estimation.
    Roman

    Comment


    • #3
      Roman,

      Thank you for clarifying.

      Best regards,

      Sara

      Comment

      Working...
      X