Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xi:regress problem with generated variable names

    Hello there,

    I'm working with panel data (29 countries over a perdiod of 14 years , 2007-2020)
    When using the xi:regess command (STATA 17.0) and then add i.country to generate the dummy variables
    I get this output:
    Code:
    xi:regress educ population gini broadband incomeMean i.country [aw=1/educ]
    i.country         _Icountry_1-30      (naturally coded; _Icountry_1 omitted)
    (sum of wgt is 11.34479442825246)
    
          Source |       SS           df       MS      Number of obs   =       398
    -------------+----------------------------------   F(32, 365)      =    167.45
           Model |  36681.0851        32  1146.28391   Prob > F        =    0.0000
        Residual |  2498.56033       365  6.84537078   R-squared       =    0.9362
    -------------+----------------------------------   Adj R-squared   =    0.9306
           Total |  39179.6454       397  98.6892832   Root MSE        =    2.6164
    
    ------------------------------------------------------------------------------
            educ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      population |   6.39e-08   2.97e-07     0.22   0.830    -5.20e-07    6.48e-07
            gini |  -.2359149   .1102763    -2.14   0.033    -.4527716   -.0190583
       broadband |   .2066379   .0094244    21.93   0.000      .188105    .2251708
      incomeMean |   .0005482   .0000826     6.64   0.000     .0003857    .0007106
     _Icountry_2 |   15.91462   1.304034    12.20   0.000     13.35025    18.47898
     _Icountry_3 |   20.26676    2.10907     9.61   0.000     16.11931    24.41421
     _Icountry_4 |   13.60651   1.983355     6.86   0.000     9.706272    17.50675
     _Icountry_5 |   31.61921   2.493635    12.68   0.000     26.71551     36.5229
     _Icountry_6 |   8.080169   1.782312     4.53   0.000     4.575281    11.58506
     _Icountry_7 |   8.013918   1.427681     5.61   0.000     5.206405    10.82143
     _Icountry_8 |   20.80923   2.508531     8.30   0.000     15.87624    25.74222
     _Icountry_9 |   9.350569   1.348472     6.93   0.000     6.698819    12.00232
    _Icountry_10 |   13.16516   16.97854     0.78   0.439    -20.22287    46.55319
    _Icountry_11 |  -3.987427   21.79948    -0.18   0.855    -46.85576    38.88091
    _Icountry_13 |   20.78209   1.833307    11.34   0.000     17.17692    24.38726
    _Icountry_14 |   12.50674   1.919541     6.52   0.000     8.731996    16.28149
    _Icountry_15 |   24.76745   1.614025    15.35   0.000      21.5935    27.94141
    _Icountry_16 |  -.6516506   15.32713    -0.04   0.966    -30.79222    29.48892
    _Icountry_17 |   23.01193   2.534999     9.08   0.000     18.02689    27.99697
    _Icountry_18 |   36.06938   2.395142    15.06   0.000     31.35937    40.77939
    _Icountry_19 |   12.17241   3.040315     4.00   0.000     6.193677    18.15114
    _Icountry_20 |    8.21989   2.503285     3.28   0.001     3.297219    13.14256
    _Icountry_21 |   11.56301   2.682991     4.31   0.000     6.286946    16.83907
    _Icountry_22 |   6.888599   2.106742     3.27   0.001     2.745724    11.03147
    _Icountry_23 |   22.10704   9.308974     2.37   0.018     3.801081    40.41299
    _Icountry_24 |   13.60983   1.814641     7.50   0.000     10.04137    17.17829
    _Icountry_25 |   12.77866   4.358017     2.93   0.004     4.208688    21.34863
    _Icountry_26 |   11.15901   1.703078     6.55   0.000      7.80993    14.50808
    _Icountry_27 |   15.18265   2.122231     7.15   0.000     11.00932    19.35599
    _Icountry_28 |   18.24387   11.47995     1.59   0.113    -4.331271      40.819
    _Icountry_29 |   13.98224   1.097104    12.74   0.000      11.8248    16.13968
    _Icountry_30 |   12.61869     16.425     0.77   0.443    -19.68082    44.91821
           _cons |   5.389878   3.814118     1.41   0.158    -2.110527    12.89028
    ------------------------------------------------------------------------------
    The problem is that between "_Icountry_11" and "_Icountry_13" there is no "_Icountry_12". It always skips number 12 for some reason I can't figure. "_Icountry_1" is omitted (for obvious reasons) so it should go from "_Icountry_2" to "_Icountry_29", but here it goes from "_Icountry_2" to "_Icountry_30" and skips 12. Upon checking the variables everything seems correct, and every country has a "1" for their assigned dummy variable.
    But it's not very convenient to show the results like this, I would get questions about why "_Icountry_12" is "missing".

    Is there any way I can fix this? Thank you

  • #2
    Ancient method. Use
    Code:
    xtreg y x, fe

    Comment


    • #3
      Abdelkarim:
      as an aside to Jared's excellent advice, you did not code up a panel data regression, as you did not cluster your standard errors on your -panelid-. In fact, your code considers each observation as independent, despite the panel structure of your dataset.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by Jared Greathouse View Post
        Ancient method. Use
        Code:
        xtreg y x, fe
        Thank you!

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Abdelkarim:
          as an aside to Jared's excellent advice, you did not code up a panel data regression, as you did not cluster your standard errors on your -panelid-. In fact, your code considers each observation as independent, despite the panel structure of your dataset.
          I tried Jared's adivce, and these are the results I got :

          Code:
           xtreg educ population gini broadband incomeMean, fe
          
          Fixed-effects (within) regression               Number of obs     =        398
          Group variable: country                         Number of groups  =         29
          
          R-squared:                                      Obs per group:
               Within  = 0.7214                                         min =         11
               Between = 0.3053                                         avg =       13.7
               Overall = 0.3832                                         max =         14
          
                                                          F(4,365)          =     236.33
          corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000
          
          ------------------------------------------------------------------------------
                  educ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
            population |  -9.70e-08   2.92e-07    -0.33   0.740    -6.71e-07    4.77e-07
                  gini |   -.150809   .1070503    -1.41   0.160    -.3613219    .0597038
             broadband |   .2142734   .0098261    21.81   0.000     .1949504    .2335963
            incomeMean |   .0004968   .0000762     6.52   0.000      .000347    .0006467
                 _cons |   20.37698   5.580489     3.65   0.000     9.403037    31.35093
          -------------+----------------------------------------------------------------
               sigma_u |  7.6449458
               sigma_e |  2.5569006
                   rho |  .89939297   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(28, 365) = 91.84                    Prob > F = 0.0000
          Do I still need to cluster my standard errors with his method? Sorry I'm fairly new to Stata.

          Comment


          • #6
            There's no law of nature saying you should, but you likely should

            Comment


            • #7
              Originally posted by Jared Greathouse View Post
              There's no law of nature saying you should, but you likely should
              How does one cluster the standard errors? Is it using Robust?
              Code:
              . xtreg educ population gini broadband incomeMean, fe robust
              
              Fixed-effects (within) regression               Number of obs     =        398
              Group variable: country                         Number of groups  =         29
              
              R-squared:                                      Obs per group:
                   Within  = 0.7214                                         min =         11
                   Between = 0.3053                                         avg =       13.7
                   Overall = 0.3832                                         max =         14
              
                                                              F(4,28)           =      35.53
              corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000
              
                                             (Std. err. adjusted for 29 clusters in country)
              ------------------------------------------------------------------------------
                           |               Robust
                      educ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                population |  -9.70e-08   4.97e-07    -0.20   0.847    -1.12e-06    9.21e-07
                      gini |   -.150809   .1814625    -0.83   0.413    -.5225182    .2209001
                 broadband |   .2142734   .0252991     8.47   0.000     .1624506    .2660962
                incomeMean |   .0004968   .0001977     2.51   0.018      .000092    .0009017
                     _cons |   20.37698   9.725685     2.10   0.045     .4548208    40.29915
              -------------+----------------------------------------------------------------
                   sigma_u |  7.6449458
                   sigma_e |  2.5569006
                       rho |  .89939297   (fraction of variance due to u_i)
              ------------------------------------------------------------------------------
              
              .

              Comment


              • #8
                I think it's vce(cl panelid), but I've not used regular reg in a long time, so look at the help via h reg

                Comment


                • #9
                  Originally posted by Jared Greathouse View Post
                  I think it's vce(cl panelid), but I've not used regular reg in a long time, so look at the help via h reg
                  I checked and it's correct , it's vce(cl) , but what I put as panelid? I put in country and it gave me the same result as if I just used robust instead of vce(cl country)
                  Code:
                  . xtreg educ population gini broadband incomeMean, fe vce(cl country)
                  
                  Fixed-effects (within) regression               Number of obs     =        398
                  Group variable: country                         Number of groups  =         29
                  
                  R-squared:                                      Obs per group:
                       Within  = 0.7214                                         min =         11
                       Between = 0.3053                                         avg =       13.7
                       Overall = 0.3832                                         max =         14
                  
                                                                  F(4,28)           =      35.53
                  corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000
                  
                                                 (Std. err. adjusted for 29 clusters in country)
                  ------------------------------------------------------------------------------
                               |               Robust
                          educ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                    population |  -9.70e-08   4.97e-07    -0.20   0.847    -1.12e-06    9.21e-07
                          gini |   -.150809   .1814625    -0.83   0.413    -.5225182    .2209001
                     broadband |   .2142734   .0252991     8.47   0.000     .1624506    .2660962
                    incomeMean |   .0004968   .0001977     2.51   0.018      .000092    .0009017
                         _cons |   20.37698   9.725685     2.10   0.045     .4548208    40.29915
                  -------------+----------------------------------------------------------------
                       sigma_u |  7.6449458
                       sigma_e |  2.5569006
                           rho |  .89939297   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------

                  Comment


                  • #10
                    I think xtreg knows what the panel variable is to cluster on, but I could be mistaken. I think it clusters on the panelid normally, if your data are xtset

                    Comment


                    • #11
                      What Jared Greathouse says in #10 is correct. Since version 13, that has been the case.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        What Jared Greathouse says in #10 is correct. Since version 13, that has been the case.
                        thanks to both of you

                        Comment


                        • #13
                          I note that the output shown in posts 5, 7, and 9 tells us that with the data grouped by country, there are 29 groups. That confirms what was stated in post #1, that the panel data has 29 countries. With that said, if the country numbers range between 1 and 30, as shown in the output there, then one would expect one of the numbers 1..30 to not be used for any of the 29 countries and thus be missing in the data: either country 1 or country 12, apparently, with the other omitted to avoid collinearity.

                          Just thought it was worth answering the original question, despite the discussion having moved on to improved methodology by leaving xi: behind, since this question might arise again as the analysis continues.
                          Last edited by William Lisowski; 26 Jun 2022, 18:09.

                          Comment


                          • #14
                            I was so caught up in the technical details I forgot the first question! But yes, as William Lisowski says, you need one reference country with FE, so you'll have 29 (N-1) FE.

                            Comment


                            • #15
                              Perhaps my post #13 was unclear. The point was that the largest country is 30, but in fact we are told there are only N=29 countries, so one of the indicators between 1 and 30 must not actually appear in the data and will not appear in the output. As always one more of the indicators will be the reference country, so 28 = N-1 indicators (fixed effects) will appear in the output, and 12 and 1 do not appear.

                              The output of
                              Code:
                              tab country
                              should confirm this.
                              Last edited by William Lisowski; 26 Jun 2022, 18:54.

                              Comment

                              Working...
                              X