xi:regress problem with generated variable names

Abdelkarim VUB

Join Date: Jun 2022
Posts: 10

xi:regress problem with generated variable names

26 Jun 2022, 12:21

Hello there,

I'm working with panel data (29 countries over a perdiod of 14 years , 2007-2020)
When using the xi:regess command (STATA 17.0) and then add i.country to generate the dummy variables
I get this output:

Code:

xi:regress educ population gini broadband incomeMean i.country [aw=1/educ]
i.country         _Icountry_1-30      (naturally coded; _Icountry_1 omitted)
(sum of wgt is 11.34479442825246)

      Source |       SS           df       MS      Number of obs   =       398
-------------+----------------------------------   F(32, 365)      =    167.45
       Model |  36681.0851        32  1146.28391   Prob > F        =    0.0000
    Residual |  2498.56033       365  6.84537078   R-squared       =    0.9362
-------------+----------------------------------   Adj R-squared   =    0.9306
       Total |  39179.6454       397  98.6892832   Root MSE        =    2.6164

------------------------------------------------------------------------------
        educ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  population |   6.39e-08   2.97e-07     0.22   0.830    -5.20e-07    6.48e-07
        gini |  -.2359149   .1102763    -2.14   0.033    -.4527716   -.0190583
   broadband |   .2066379   .0094244    21.93   0.000      .188105    .2251708
  incomeMean |   .0005482   .0000826     6.64   0.000     .0003857    .0007106
 _Icountry_2 |   15.91462   1.304034    12.20   0.000     13.35025    18.47898
 _Icountry_3 |   20.26676    2.10907     9.61   0.000     16.11931    24.41421
 _Icountry_4 |   13.60651   1.983355     6.86   0.000     9.706272    17.50675
 _Icountry_5 |   31.61921   2.493635    12.68   0.000     26.71551     36.5229
 _Icountry_6 |   8.080169   1.782312     4.53   0.000     4.575281    11.58506
 _Icountry_7 |   8.013918   1.427681     5.61   0.000     5.206405    10.82143
 _Icountry_8 |   20.80923   2.508531     8.30   0.000     15.87624    25.74222
 _Icountry_9 |   9.350569   1.348472     6.93   0.000     6.698819    12.00232
_Icountry_10 |   13.16516   16.97854     0.78   0.439    -20.22287    46.55319
_Icountry_11 |  -3.987427   21.79948    -0.18   0.855    -46.85576    38.88091
_Icountry_13 |   20.78209   1.833307    11.34   0.000     17.17692    24.38726
_Icountry_14 |   12.50674   1.919541     6.52   0.000     8.731996    16.28149
_Icountry_15 |   24.76745   1.614025    15.35   0.000      21.5935    27.94141
_Icountry_16 |  -.6516506   15.32713    -0.04   0.966    -30.79222    29.48892
_Icountry_17 |   23.01193   2.534999     9.08   0.000     18.02689    27.99697
_Icountry_18 |   36.06938   2.395142    15.06   0.000     31.35937    40.77939
_Icountry_19 |   12.17241   3.040315     4.00   0.000     6.193677    18.15114
_Icountry_20 |    8.21989   2.503285     3.28   0.001     3.297219    13.14256
_Icountry_21 |   11.56301   2.682991     4.31   0.000     6.286946    16.83907
_Icountry_22 |   6.888599   2.106742     3.27   0.001     2.745724    11.03147
_Icountry_23 |   22.10704   9.308974     2.37   0.018     3.801081    40.41299
_Icountry_24 |   13.60983   1.814641     7.50   0.000     10.04137    17.17829
_Icountry_25 |   12.77866   4.358017     2.93   0.004     4.208688    21.34863
_Icountry_26 |   11.15901   1.703078     6.55   0.000      7.80993    14.50808
_Icountry_27 |   15.18265   2.122231     7.15   0.000     11.00932    19.35599
_Icountry_28 |   18.24387   11.47995     1.59   0.113    -4.331271      40.819
_Icountry_29 |   13.98224   1.097104    12.74   0.000      11.8248    16.13968
_Icountry_30 |   12.61869     16.425     0.77   0.443    -19.68082    44.91821
       _cons |   5.389878   3.814118     1.41   0.158    -2.110527    12.89028
------------------------------------------------------------------------------

The problem is that between "_Icountry_11" and "_Icountry_13" there is no "_Icountry_12". It always skips number 12 for some reason I can't figure. "_Icountry_1" is omitted (for obvious reasons) so it should go from "_Icountry_2" to "_Icountry_29", but here it goes from "_Icountry_2" to "_Icountry_30" and skips 12. Upon checking the variables everything seems correct, and every country has a "1" for their assigned dummy variable.
But it's not very convenient to show the results like this, I would get questions about why "_Icountry_12" is "missing".

Is there any way I can fix this? Thank you

Tags: None

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#2

26 Jun 2022, 13:26

Ancient method. Use

Code:

xtreg y x, fe
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17853
#3

26 Jun 2022, 13:42

Abdelkarim:
as an aside to Jared's excellent advice, you did not code up a panel data regression, as you did not cluster your standard errors on your -panelid-. In fact, your code considers each observation as independent, despite the panel structure of your dataset.

Kind regards,
Carlo
(Stata 19.0)
Comment
Abdelkarim VUB

Join Date: Jun 2022

Posts: 10
#4

26 Jun 2022, 13:53

Originally posted by Jared Greathouse View Post

Ancient method. Use

Code:

xtreg y x, fe

Thank you!
Comment

Abdelkarim VUB

Join Date: Jun 2022
Posts: 10

26 Jun 2022, 14:01

Originally posted by Carlo Lazzaro View Post

Abdelkarim:
as an aside to Jared's excellent advice, you did not code up a panel data regression, as you did not cluster your standard errors on your -panelid-. In fact, your code considers each observation as independent, despite the panel structure of your dataset.

I tried Jared's adivce, and these are the results I got :

Code:

 xtreg educ population gini broadband incomeMean, fe

Fixed-effects (within) regression               Number of obs     =        398
Group variable: country                         Number of groups  =         29

R-squared:                                      Obs per group:
     Within  = 0.7214                                         min =         11
     Between = 0.3053                                         avg =       13.7
     Overall = 0.3832                                         max =         14

                                                F(4,365)          =     236.33
corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000

------------------------------------------------------------------------------
        educ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  population |  -9.70e-08   2.92e-07    -0.33   0.740    -6.71e-07    4.77e-07
        gini |   -.150809   .1070503    -1.41   0.160    -.3613219    .0597038
   broadband |   .2142734   .0098261    21.81   0.000     .1949504    .2335963
  incomeMean |   .0004968   .0000762     6.52   0.000      .000347    .0006467
       _cons |   20.37698   5.580489     3.65   0.000     9.403037    31.35093
-------------+----------------------------------------------------------------
     sigma_u |  7.6449458
     sigma_e |  2.5569006
         rho |  .89939297   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(28, 365) = 91.84                    Prob > F = 0.0000

Do I still need to cluster my standard errors with his method? Sorry I'm fairly new to Stata.

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#6

26 Jun 2022, 14:04

There's no law of nature saying you should, but you likely should
Comment

Abdelkarim VUB

Join Date: Jun 2022
Posts: 10

26 Jun 2022, 14:06

Originally posted by Jared Greathouse View Post

There's no law of nature saying you should, but you likely should

How does one cluster the standard errors? Is it using Robust?

Code:

. xtreg educ population gini broadband incomeMean, fe robust

Fixed-effects (within) regression               Number of obs     =        398
Group variable: country                         Number of groups  =         29

R-squared:                                      Obs per group:
     Within  = 0.7214                                         min =         11
     Between = 0.3053                                         avg =       13.7
     Overall = 0.3832                                         max =         14

                                                F(4,28)           =      35.53
corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000

                               (Std. err. adjusted for 29 clusters in country)
------------------------------------------------------------------------------
             |               Robust
        educ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  population |  -9.70e-08   4.97e-07    -0.20   0.847    -1.12e-06    9.21e-07
        gini |   -.150809   .1814625    -0.83   0.413    -.5225182    .2209001
   broadband |   .2142734   .0252991     8.47   0.000     .1624506    .2660962
  incomeMean |   .0004968   .0001977     2.51   0.018      .000092    .0009017
       _cons |   20.37698   9.725685     2.10   0.045     .4548208    40.29915
-------------+----------------------------------------------------------------
     sigma_u |  7.6449458
     sigma_e |  2.5569006
         rho |  .89939297   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#8

26 Jun 2022, 15:33

I think it's vce(cl panelid), but I've not used regular reg in a long time, so look at the help via h reg
Comment

Abdelkarim VUB

Join Date: Jun 2022
Posts: 10

26 Jun 2022, 15:41

Originally posted by Jared Greathouse View Post

I think it's vce(cl panelid), but I've not used regular reg in a long time, so look at the help via h reg

I checked and it's correct , it's vce(cl) , but what I put as panelid? I put in country and it gave me the same result as if I just used robust instead of vce(cl country)

Code:

. xtreg educ population gini broadband incomeMean, fe vce(cl country)

Fixed-effects (within) regression               Number of obs     =        398
Group variable: country                         Number of groups  =         29

R-squared:                                      Obs per group:
     Within  = 0.7214                                         min =         11
     Between = 0.3053                                         avg =       13.7
     Overall = 0.3832                                         max =         14

                                                F(4,28)           =      35.53
corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000

                               (Std. err. adjusted for 29 clusters in country)
------------------------------------------------------------------------------
             |               Robust
        educ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  population |  -9.70e-08   4.97e-07    -0.20   0.847    -1.12e-06    9.21e-07
        gini |   -.150809   .1814625    -0.83   0.413    -.5225182    .2209001
   broadband |   .2142734   .0252991     8.47   0.000     .1624506    .2660962
  incomeMean |   .0004968   .0001977     2.51   0.018      .000092    .0009017
       _cons |   20.37698   9.725685     2.10   0.045     .4548208    40.29915
-------------+----------------------------------------------------------------
     sigma_u |  7.6449458
     sigma_e |  2.5569006
         rho |  .89939297   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#10

26 Jun 2022, 15:47

I think xtreg knows what the panel variable is to cluster on, but I could be mistaken. I think it clusters on the panelid normally, if your data are xtset
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#11

26 Jun 2022, 15:59

What Jared Greathouse says in #10 is correct. Since version 13, that has been the case.
Comment
Abdelkarim VUB

Join Date: Jun 2022

Posts: 10
#12

26 Jun 2022, 16:09

Originally posted by Clyde Schechter View Post

What Jared Greathouse says in #10 is correct. Since version 13, that has been the case.

thanks to both of you
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#13

26 Jun 2022, 18:06

I note that the output shown in posts 5, 7, and 9 tells us that with the data grouped by country, there are 29 groups. That confirms what was stated in post #1, that the panel data has 29 countries. With that said, if the country numbers range between 1 and 30, as shown in the output there, then one would expect one of the numbers 1..30 to not be used for any of the 29 countries and thus be missing in the data: either country 1 or country 12, apparently, with the other omitted to avoid collinearity.

Just thought it was worth answering the original question, despite the discussion having moved on to improved methodology by leaving xi: behind, since this question might arise again as the analysis continues.

Last edited by William Lisowski; 26 Jun 2022, 18:09.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#14

26 Jun 2022, 18:26

I was so caught up in the technical details I forgot the first question! But yes, as William Lisowski says, you need one reference country with FE, so you'll have 29 (N-1) FE.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#15

26 Jun 2022, 18:46

Perhaps my post #13 was unclear. The point was that the largest country is 30, but in fact we are told there are only N=29 countries, so one of the indicators between 1 and 30 must not actually appear in the data and will not appear in the output. As always one more of the indicators will be the reference country, so 28 = N-1 indicators (fixed effects) will appear in the output, and 12 and 1 do not appear.

The output of

Code:

tab country

should confirm this.

Last edited by William Lisowski; 26 Jun 2022, 18:54.
1 like
Comment

Announcement

xi:regress problem with generated variable names

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment