Problem with dummy variable regression

Alexia Zupancic

Join Date: Oct 2018
Posts: 4

Problem with dummy variable regression

30 Oct 2018, 10:57

Hello everyone,

I am currently working on my bachelor thesis and I am having a problem with output interpretation. I am new to this program as well and I am learning how to use it with the thesis I am writing. I running this regression:

xi: reg rca_v rca_c i.sector, r

where rca is the comparative advantage of Vietnam over a 15 years time frame, in 20 group of products, rca china is the same for China and i.sector is the dummies created for these 20 group of products I am considering. I know that the coefficients of the dummy variables are measured in comparison with the sector that stata omitted in order to run the regression, so that for example sector 6 is 6,56 times better at exporting its products, but I cannot insert this interpretation with the country comparison.

I thank everyone in advance for taking the time to read my question, I would be really grateful to anyone who would help me with the understanding of this problem...thank you!

Alexia

[HTML]

. xi: reg rca_v rca_c i.sector, r
i.sector _Isector_1-20	(_Isector_1 for sector==animal	omitted)
Linear regression	Number of obs =	320
F(20, 299) =	157.00
Prob > F =	0.0000
R-squared =	0.9607
Root MSE =	.7143

\|Robust
rca_v \| Coef. Std. Err.	t P>\|t\| [95% Conf.	Interval]

rca_c \| 1.522164 .3141637	4.85 0.000 .9039122	2.140416
_Isector_2 \| -3.916549 .3693484	-10.60 0.000 -4.6434	-3.189697
_Isector_3 \| -2.875985 .2018616	-14.25 0.000 -3.273235	-2.478736
_Isector_4 \| -2.857842 .3585056	-7.97 0.000 -3.563356	-2.152328
_Isector_5 \| -2.182294 .2021623	-10.79 0.000 -2.580135	-1.784453
_Isector_6 \| 6.560848 1.641889	4.00 0.000 3.329726	9.79197
_Isector_7 \| -1.282726 .2865632	-4.48 0.000 -1.846662	-.7187895
_Isector_8 \| -3.006791 .216144	-13.91 0.000 -3.432147	-2.581435
_Isector_9 \| -4.306507 .4645494	-9.27 0.000 -5.220708	-3.392307
_Isector_10 \| -3.447117 .2559917	-13.47 0.000 -3.95089	-2.943343
_Isector_11 \| -2.244465 .2230231	-10.06 0.000 -2.683358	-1.805571
_Isector_12 \| -3.815585 .3967278	-9.62 0.000 -4.596317	-3.034852
_Isector_13 \| -2.842299 .2330736	-12.19 0.000 -3.300971	-2.383626
_Isector_14 \| -.619659 .2875194	-2.16 0.032 -1.185477	-.0538411
_Isector_15 \| -4.45899 .9704257	-4.59 0.000 -6.368719	-2.54926
_Isector_16 \| -2.858558 .2369532	-12.06 0.000 -3.324865	-2.392251
_Isector_17 \| -2.627071 .7526744	-3.49 0.001 -4.108281	-1.145861
_Isector_18 \| -2.614191 .210671	-12.41 0.000 -3.028777	-2.199605
_Isector_19 \| .0892393 .2573679	0.35 0.729 -.4172426	.5957213
_Isector_20 \| -2.670938 .2141246	-12.47 0.000 -3.09232	-2.249556
_cons \| 2.38135 .2390579	9.96 0.000 1.910901	2.851799

/HTML]

Tags: dummy variables, panel data, regression, Suggestion, Time Series

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#2

30 Oct 2018, 11:16

Alexia:
welcome to this forum.
Some comments about your query:
- if you're using a pretty recent Stata release, please note that the -xi-prefix is definitely redundant if you use -fvvarlist- notation;
- more substantively: if your dataset is actually composef of a cross-sectional (20 groups of products) and a time-series dimension (15 years), you should consider a panel data regression via -xtreg-.
As an aside, for the future use CODE delimiters (# symbol) instead <>(HTML) to share what you typed and what Stata gave you back. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alexia Zupancic

Join Date: Oct 2018

Posts: 4
#3

30 Oct 2018, 11:58

Originally posted by Carlo Lazzaro View Post

Alexia:
welcome to this forum.
Some comments about your query:
- if you're using a pretty recent Stata release, please note that the -xi-prefix is definitely redundant if you use -fvvarlist- notation;
- more substantively: if your dataset is actually composef of a cross-sectional (20 groups of products) and a time-series dimension (15 years), you should consider a panel data regression via -xtreg-.
As an aside, for the future use CODE delimiters (# symbol) instead <>(HTML) to share what you typed and what Stata gave you back. Thanks.

Thank you very much for your kind answer!
I just used xtreg insead of the simple commad "reg" (therefore: xtreg: rca_v rca_c i.sector ) and the output that I got considers random effects automatically. I tried running the same regression adding "fe" after the comma, so it would be xtreg: rca_v rca_c i.sector, fe , but the output omits all the variables because of collinearity, so I suppose that the right model uses random effects. I ran an hausman test to verify, with xtreg: rca_v rca_c i.sector, fe and xtreg: rca_v rca_c i.sector, re, but the Chi2 is minor than 0, I suppose I am doing something wrong...

Thank you,

Kind regards,

Alexia

Code:

hausman fe re Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe re Difference S.E. rca_c | 1.522164 1.522164 8.88e-16 . b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B) = -0.00 chi2<0 ==> model fitted on these data fails to meet the asymptotic assumptions of the Hausman test; see suest for a generalized test
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#4

30 Oct 2018, 12:22

Alexia:
some comments about your query:
- as you surmise, -xtreg- considers -re- specification by default;
- as you know,, -fe- machinery wipes out all time-invariant predictors (and those collinear with the fixed effect);
- you do not report anything about the F-test appearing as a footnote of the -xtreg,fe- outcome table: if it lacks statistical significance, you should switch to a pooled OLS;
- the way you compare -fe- vs -re- specification via -hausman- is correct. The nasty -hausman- output is not unusual; try to re-run -hausman- with the -sigmamore- option and see what happens.

Kind regards,
Carlo
(Stata 19.0)
Comment

Alexia Zupancic

Join Date: Oct 2018
Posts: 4

30 Oct 2018, 12:59

Originally posted by Carlo Lazzaro View Post

Alexia:
some comments about your query:
- as you surmise, -xtreg- considers -re- specification by default;
- as you know,, -fe- machinery wipes out all time-invariant predictors (and those collinear with the fixed effect);
- you do not report anything about the F-test appearing as a footnote of the -xtreg,fe- outcome table: if it lacks statistical significance, you should switch to a pooled OLS;
- the way you compare -fe- vs -re- specification via -hausman- is correct. The nasty -hausman- output is not unusual; try to re-run -hausman- with the -sigmamore- option and see what happens.

Thank you very much again!I tried to re run the hausman test with the sigmamore option but the result unfortunately is the same...

Both of the outcome tables seem to be statistically significant. Before running a model with the variable "i.sector" and therefore having all the dummies created and specified in the output, I ran the simple regression xtreg: rca_v rca_c, fe and compared it with the same model with random effects -xtreg,re-, and the hausman test then rejected the null hypotesis - so it suggested me to use the fixed effect model - (the variable rca_j, where j is the country =v, c, is a matrix that includes the revealed comparative advantage for 20 sector in the year 2000, then the same but for year 2001 etc..).

Sorry for the long answer, thank you very much for your time and for your precious help.

Kind regards,

Alexia

Code:

. xi: xtreg rca_v rca_c i.sector, re
i.sector          _Isector_1-20       (_Isector_1 for sector==animal omitted)

Random-effects GLS regression                   Number of obs     =        320
Group variable: id                              Number of groups  =         20

R-sq:                                           Obs per group:
     within  = 0.3229                                         min =         16
     between = 1.0000                                         avg =       16.0
     overall = 0.9607                                         max =         16

                                                Wald chi2(20)     =    7309.23
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
       rca_v |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rca_c |   1.522164   .1274647    11.94   0.000     1.272338     1.77199
  _Isector_2 |  -3.916549    .280524   -13.96   0.000    -4.466366   -3.366732
  _Isector_3 |  -2.875985   .2525529   -11.39   0.000     -3.37098   -2.380991
  _Isector_4 |  -2.857842   .2790836   -10.24   0.000    -3.404836   -2.310848
  _Isector_5 |  -2.182294   .2525897    -8.64   0.000     -2.67736   -1.687227
  _Isector_6 |   6.560848   .5925776    11.07   0.000     5.399417    7.722279
  _Isector_7 |  -1.282726   .2554231    -5.02   0.000    -1.783346   -.7821057
  _Isector_8 |  -3.006791   .2545469   -11.81   0.000    -3.505694   -2.507888
  _Isector_9 |  -4.306507   .3026671   -14.23   0.000    -4.899724   -3.713291
 _Isector_10 |  -3.447117   .2605991   -13.23   0.000    -3.957881   -2.936352
 _Isector_11 |  -2.244465   .2529246    -8.87   0.000    -2.740188   -1.748742
 _Isector_12 |  -3.815585   .2861533   -13.33   0.000    -4.376435   -3.254734
 _Isector_13 |  -2.842299   .2566227   -11.08   0.000     -3.34527   -2.339328
 _Isector_14 |   -.619659   .2541948    -2.44   0.015    -1.117872   -.1214464
 _Isector_15 |   -4.45899   .4507773    -9.89   0.000    -5.342497   -3.575483
 _Isector_16 |  -2.858558   .2557288   -11.18   0.000    -3.359777   -2.357339
 _Isector_17 |  -2.627071    .382199    -6.87   0.000    -3.376167   -1.877975
 _Isector_18 |  -2.614191   .2535559   -10.31   0.000    -3.111151    -2.11723
 _Isector_19 |   .0892393   .2526777     0.35   0.724    -.4059999    .5844785
 _Isector_20 |  -2.670938   .2541767   -10.51   0.000    -3.169115   -2.172761
       _cons |    2.38135   .1858658    12.81   0.000      2.01706     2.74564
-------------+----------------------------------------------------------------
     sigma_u |          0
     sigma_e |  .71430351
         rho |          0   (fraction of variance due to u_i)
--------------------------------------------------------------------------

Code:

. xi: xtreg rca_v rca_c i.sector, fe
i.sector          _Isector_1-20       (_Isector_1 for sector==animal omitted)
note: _Isector_2 omitted because of collinearity
note: _Isector_3 omitted because of collinearity
note: _Isector_4 omitted because of collinearity
note: _Isector_5 omitted because of collinearity
note: _Isector_6 omitted because of collinearity
note: _Isector_7 omitted because of collinearity
note: _Isector_8 omitted because of collinearity
note: _Isector_9 omitted because of collinearity
note: _Isector_10 omitted because of collinearity
note: _Isector_11 omitted because of collinearity
note: _Isector_12 omitted because of collinearity
note: _Isector_13 omitted because of collinearity
note: _Isector_14 omitted because of collinearity
note: _Isector_15 omitted because of collinearity
note: _Isector_16 omitted because of collinearity
note: _Isector_17 omitted because of collinearity
note: _Isector_18 omitted because of collinearity
note: _Isector_19 omitted because of collinearity
note: _Isector_20 omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        320
Group variable: id                              Number of groups  =         20

R-sq:                                           Obs per group:
     within  = 0.3229                                         min =         16
     between = 0.5804                                         avg =       16.0
     overall = 0.5615                                         max =         16

                                                F(1,299)          =     142.61
corr(u_i, Xb)  = 0.3410                         Prob > F          =     0.0000

------------------------------------------------------------------------------
       rca_v |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       rca_c |   1.522164   .1274647    11.94   0.000     1.271323    1.773006
  _Isector_2 |          0  (omitted)
  _Isector_3 |          0  (omitted)
  _Isector_4 |          0  (omitted)
  _Isector_5 |          0  (omitted)
  _Isector_6 |          0  (omitted)
  _Isector_7 |          0  (omitted)
  _Isector_8 |          0  (omitted)
  _Isector_9 |          0  (omitted)
 _Isector_10 |          0  (omitted)
 _Isector_11 |          0  (omitted)
 _Isector_12 |          0  (omitted)
 _Isector_13 |          0  (omitted)
 _Isector_14 |          0  (omitted)
 _Isector_15 |          0  (omitted)
 _Isector_16 |          0  (omitted)
 _Isector_17 |          0  (omitted)
 _Isector_18 |          0  (omitted)
 _Isector_19 |          0  (omitted)
 _Isector_20 |          0  (omitted)
       _cons |   .2824759   .1487423     1.90   0.059    -.0102384    .5751903
-------------+----------------------------------------------------------------
     sigma_u |  2.4015857
     sigma_e |  .71430351
         rho |  .91872535   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(19, 299) = 159.83                   Prob > F = 0.0000

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#6

30 Oct 2018, 13:19

Alexia:
thanks for providing Stata stuff.
My educated guess is that your model are probably overfitted: a between R-sq=1 for -re- is particularly stunning in this respect.
I would start it all over aging trying to collect more predictors.
That said,the main issue is that you seemingky have a T>N panel dataset (ie, the cross-sectional dimension<time-series dimension): if I'm correct at this diagnosis, you should switch to Stata commans suited for long panel datasets, such as -xtgls-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alexia Zupancic

Join Date: Oct 2018

Posts: 4
#7

30 Oct 2018, 14:06

Originally posted by Carlo Lazzaro View Post

Alexia:
thanks for providing Stata stuff.
My educated guess is that your model are probably overfitted: a between R-sq=1 for -re- is particularly stunning in this respect.
I would start it all over aging trying to collect more predictors.
That said,the main issue is that you seemingky have a T>N panel dataset (ie, the cross-sectional dimension<time-series dimension): if I'm correct at this diagnosis, you should switch to Stata commans suited for long panel datasets, such as -xtgls-.

Thank you Carlo, I am sure it is so, I used this model because it is particularly difficult to retrieve datas for Vietnam (for example, total factor productivity per sector/group of products, etc), but I realize that these datas alone are not enough for a well fitted model. I will try to collect more predictors.
My T has 15 observations and my N has 20... thank you very much for your help, it was really needed and appreciated, I hope you have a good day!

Kind regards,

Alexia
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#8

30 Oct 2018, 16:37

Alexia:
with N still a bit>T, you can stick with-xtreg-.
Perhaps you should consider using clustered standard errors to take autocorrelation into account.
As you noticed, the issue of a scant number of predictors is the main problem here.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Problem with dummy variable regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment