Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Low amount of clusters should I use vce(cluster) in my xtivreg?

    Hey,

    I have a panel data with instrumental variables, therefore I am using xtivreg with random effects. My data has 17 regions divided over 6 countries within a time span of 10 years. I have read that it might be better to not include vce(cluster) when the amount of clusters is very low. I am wondering if my 17 clusters for regions would be too low in this regression. Does anyone know if I should include it?

    Kind regards,

    Emy

  • #2
    Emy:
    welcome to this forum.
    Unfortunately, there's no hard and fast rule that tells when clusters are really enough for more precise standard errors (SEs) calculation.
    The usual rul of thumb is performin the sama regression model with and without clustered SEs and see whether they really differ; if they do not differ that much, you can use default (ie, non-clustered) SEs.
    For the future (and as per FAQ), please share what you typed and what Stata gave you back via CODE delimiters and/or provide an excerpt/example of your data via -dataex- (see the FAQ again for more details). Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      Thank you for your answer. I'm sorry, I forgot to include the stata inputs. Below are the inputs and outputs with and without clustered SEs, when I do not add them the main variable is significant but when I do add the clustered SEs it is not significant. Do you think the difference is such that I should include it?



      xtivreg Unrest ( mp1 cof1 gdp1 = eth1 cor1 int1 rai1 ) oil1 lpop i.Month i.Year, re vce(cluster Region)


      G2SLS random-effects IV regression Number of obs = 1,648
      Group variable: regionnum Number of groups = 17

      R-sq: Obs per group:
      within = 0.0018 min = 44
      between = 0.5665 avg = 96.9
      overall = 0.0537 max = 125


      Wald chi2(9) = 488.93
      corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

      (Std. Err. adjusted for 18 clusters in Region)
      --------------------------------------------------------------------------------
      | Robust
      Unrest | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      ---------------+----------------------------------------------------------------
      mp1 | .7599231 .5292532 1.44 0.151 -.2773942 1.79724
      cof1 | .0153757 .1259675 0.12 0.903 -.231516 .2622674
      gdp1 | -.000117 .0001587 -0.74 0.461 -.0004281 .000194
      oil1 | -.0007948 .001841 -0.43 0.666 -.0044031 .0028136
      lpop | -.1431133 .1620527 -0.88 0.377 -.4607308 .1745042

      _cons | .202604 .2207474 0.92 0.359 -.2300531 .635261
      ---------------+----------------------------------------------------------------
      sigma_u | .13321358
      sigma_e | .50374521
      rho | .06536105 (fraction of variance due to u_i)
      --------------------------------------------------------------------------------



      . xtivreg Unrest ( mp1 cof1 gdp1 = eth1 cor1 int1 rai1 ) oil1 lpop i.Month i.Year, re


      G2SLS random-effects IV regression Number of obs = 1,648
      Group variable: regionnum Number of groups = 17

      R-sq: Obs per group:
      within = 0.0018 min = 44
      between = 0.5641 avg = 96.9
      overall = 0.0534 max = 125

      Wald chi2(10) = 35.96
      corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0001

      --------------------------------------------------------------------------------
      Unrest | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      ---------------+----------------------------------------------------------------
      mp1 | .7601641 .3200423 2.38 0.018 .1328929 1.387435
      cof1 | .014977 .0836494 0.18 0.858 -.1489728 .1789267
      gdp1 | -.0001175 .0001201 -0.98 0.328 -.000353 .0001179
      oil1 | -.000791 .0011777 -0.67 0.502 -.0030992 .0015173
      lpop | -.142885 .1256682 -1.14 0.256 -.3891901 .1034201

      _cons | .2039516 .2280955 0.89 0.371 -.2431074 .6510106
      ---------------+----------------------------------------------------------------
      sigma_u | .13834633
      sigma_e | .50374521
      rho | .07013477 (fraction of variance due to u_i)
      --------------------------------------------------------------------------------



      Kind regards,
      Emy
      Last edited by Emy Meurs; 04 Feb 2019, 11:37.

      Comment


      • #4
        Emy:
        -cluster()- options shoiuld not affect point estimates, which look different in your example.
        Take a look at the following toy-example:
        Code:
        use http://www.stata-press.com/data/r15/nlswork
        . xtivreg ln_w c.age##c.age not_smsa (tenure = union south), re
        
        G2SLS random-effects IV regression              Number of obs     =     19,007
        Group variable: idcode                          Number of groups  =      4,134
        
        R-sq:                                           Obs per group:
             within  = 0.0620                                         min =          1
             between = 0.1745                                         avg =        4.6
             overall = 0.1206                                         max =         12
        
                                                        Wald chi2(4)      =     941.52
        corr(u_i, X)       = 0 (assumed)                Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              tenure |   .1772948   .0111724    15.87   0.000     .1553972    .1991924
                 age |   .0191674   .0066388     2.89   0.004     .0061555    .0321792
                     |
         c.age#c.age |  -.0008496   .0001057    -8.04   0.000    -.0010567   -.0006425
                     |
            not_smsa |  -.2119932   .0130456   -16.25   0.000    -.2375622   -.1864243
               _cons |    1.42761   .1037797    13.76   0.000     1.224205    1.631014
        -------------+----------------------------------------------------------------
             sigma_u |  .33156584
             sigma_e |  .63029359
                 rho |  .21674808   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        Instrumented:   tenure
        Instruments:    age c.age#c.age not_smsa union south
        ------------------------------------------------------------------------------
        
        . xtivreg ln_w c.age##c.age not_smsa (tenure = union south), re vce(cluster idcode )
        
        G2SLS random-effects IV regression              Number of obs     =     19,007
        Group variable: idcode                          Number of groups  =      4,134
        
        R-sq:                                           Obs per group:
             within  = 0.0620                                         min =          1
             between = 0.1745                                         avg =        4.6
             overall = 0.1206                                         max =         12
        
        
                                                        Wald chi2(4)      =     403.60
        corr(u_i, X)       = 0 (assumed)                Prob > chi2       =     0.0000
        
                                     (Std. Err. adjusted for 4,711 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              tenure |   .1772948   .0152425    11.63   0.000       .14742    .2071696
                 age |   .0191674   .0102686     1.87   0.062    -.0009588    .0392935
                     |
         c.age#c.age |  -.0008496   .0001707    -4.98   0.000    -.0011842   -.0005149
                     |
            not_smsa |  -.2119932   .0179604   -11.80   0.000    -.2471951   -.1767914
               _cons |    1.42761   .1509676     9.46   0.000     1.131719    1.723501
        -------------+----------------------------------------------------------------
             sigma_u |  .33156584
             sigma_e |  .63029359
                 rho |  .21674808   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        Instrumented:   tenure
        Instruments:    age c.age#c.age not_smsa union south
        ------------------------------------------------------------------------------
        
        .
        As an aside, please use CODE delimiters to share what you typed and what Stata gave you back (# toggle of the Advanced editor). Thanks.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Here's one approach for patching the SEs in the small cluster case. There's Stata code on GitHub.
          Last edited by Dimitriy V. Masterov; 04 Feb 2019, 22:15.

          Comment


          • #6
            Dear Carlo,

            Thank you for your comment. I really don't know why the coefficients change. If I leave the dummies out of the regression, the coefficients are stable but once I add them the coefficients change with the vce(robust) included.

            Kind regards,
            Emy

            Comment


            • #7
              Emy:
              this is weird indeed.
              Your first code differs from the second one for the -cluster()- option only, that should not affect coefficient point estimates.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X