Normality of dependent variable in panel data

Himani Srihan

Join Date: Apr 2020

Posts: 51
#1

Normality of dependent variable in panel data

25 Nov 2020, 23:51

Hi,

I want to perform a panel data regression analysis. I have a dataset of 40 countries for 15 years. I plotted my dependent variable and found that the dependent variable is not normally distributed. I know that one of the ways to achieve normality is by taking the natural log of the dependent variable. However, I wanted to ask if the variable actually needs to be normally distributed? Somehow I could not convince myself that it is absolutely necessary for the dependent variable to be normally distributed in a panel data. I would be thankful for any help and guidance on this!
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 668
#2

26 Nov 2020, 00:49

Assuming you want to use OLS regression / xtreg, normality of the dep. var is nice to have but not a must. The internet is full of discussion about that, just google it a bit (https://www.researchgate.net/post/Is...ly_distributed). That said, maybe OLS is not the best way for your var? Maybe tell us what you are measuring or show some plots so we get an impression of your data. Otherwise, if taking the log gives you a nice and normal var, I do not see a reason not to use it (given that I know nothing more about your var at the moment).

Best wishes

(Stata 16.1 MP)
1 like
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#3

26 Nov 2020, 01:49

Originally posted by Felix Bittmann View Post

Assuming you want to use OLS regression / xtreg, normality of the dep. var is nice to have but not a must. The internet is full of discussion about that, just google it a bit (https://www.researchgate.net/post/Is...ly_distributed). That said, maybe OLS is not the best way for your var? Maybe tell us what you are measuring or show some plots so we get an impression of your data. Otherwise, if taking the log gives you a nice and normal var, I do not see a reason not to use it (given that I know nothing more about your var at the moment).

Thanks for your response! I am measuring unemployment rates of different countries. I am using fixed effects regression model and I am accounting for heteroskedasticity through the heteroskedastic robust standard errors. I am not using pooled OLS.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17691
#4

26 Nov 2020, 02:23

Himani:
if you actually have panel data and go -regress-, the -robust- option does not take the within panel correlation of your observations into account and treat them as independent (ignoring the panel data structure of your data).
You should switch to -vce(cluster panelid)- instead.
That said, as recommended by the FAQ, things would be easier if you post what you typed and what Stata gave you back. Thanks.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#5

26 Nov 2020, 04:01

Originally posted by Carlo Lazzaro View Post

Himani:
if you actually have panel data and go -regress-, the -robust- option does not take the within panel correlation of your observations into account and treat them as independent (ignoring the panel data structure of your data).
You should switch to -vce(cluster panelid)- instead.
That said, as recommended by the FAQ, things would be easier if you post what you typed and what Stata gave you back. Thanks.

Hi Carlo,

Thank for your response. Sorry for the lack of details. I am doing a fixed effects regression model where I have 40 countries for 15 years. My dependent variable is unemployment rate .My command for it is as follows:

xtreg unemployment (X regressors) i.Country i.Year i.Country1##c.Year, fe vce(cluster Country)

Hence, I control for heteroskedasticity and autocorrelation in my regression model. My main query is if I have to ensure the normality of my dependent variable, unemployment rate. The unemployment rate belongs to different countries for different years and hence it is not normally distributed. In order to get correct results, in general , is it necessary to ensure that the dependent variable is normal? I searched about it on the internet and it seems like the answer is "no", but I am still getting confused about it. Thank you.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17691
#6

26 Nov 2020, 04:11

Himani:
normality is a (weak) requirement for residual distribution only.
That said:
- if you have -xtset- your data with -Country- as -panelid-, why including -i:Country- in the right hand-side of your regression equation?
- what's the meaning of including -Year- as both categorical and continuous regressor?
Eventually the previous recommendation (as per FAQ) to post what you typed and what Stata gave you back still applies Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#7

26 Nov 2020, 04:32

Originally posted by Carlo Lazzaro View Post

Himani:
normality is a (weak) requirement for residual distribution only.
That said:
- if you have -xtset- your data with -Country- as -panelid-, why including -i:Country- in the right hand-side of your regression equation?
- what's the meaning of including -Year- as both categorical and continuous regressor?
Eventually the previous recommendation (as per FAQ) to post what you typed and what Stata gave you back still applies Thanks.

Hi Carlo,

Thanks for our response. For the first point, yes, I do not need to add the i.Country1 since I am doing fixed effects and it automatically gives me the id fixed effects.
For the second point, Year is included as time dummies(hence, categorical) and the term, i.Country1##c.Year are included as time trends (1 for every country) and hence Year is continuous when considering time trends.
I have also attached the output for your reference this time. Thanks a lot!
------------------------------------------------------------------------------------------------------
. egen Country1=group(Country)

.
. xtset Country1 Year,yearly
panel variable: Country1 (strongly balanced)
time variable: Year, 2000 to 2018
delta: 1 year

.
.
.
. gen BB1=BB/1000000
(27 missing values generated)

. xtreg UTA HC BB1 FD Pop i.Country1 i.Year i.Country1##c.Year,fe vce(cluster Country1)
note: 2.Country1 omitted because of collinearity
note: 3.Country1 omitted because of collinearity
note: 4.Country1 omitted because of collinearity
note: 5.Country1 omitted because of collinearity
note: 6.Country1 omitted because of collinearity
note: 7.Country1 omitted because of collinearity
note: 8.Country1 omitted because of collinearity
note: 9.Country1 omitted because of collinearity
note: 10.Country1 omitted because of collinearity
note: 11.Country1 omitted because of collinearity
note: 12.Country1 omitted because of collinearity
note: 13.Country1 omitted because of collinearity
note: 14.Country1 omitted because of collinearity
note: 15.Country1 omitted because of collinearity
note: 16.Country1 omitted because of collinearity
note: 17.Country1 omitted because of collinearity
note: 18.Country1 omitted because of collinearity
note: 19.Country1 omitted because of collinearity
note: 20.Country1 omitted because of collinearity
note: 21.Country1 omitted because of collinearity
note: 22.Country1 omitted because of collinearity
note: 23.Country1 omitted because of collinearity
note: 24.Country1 omitted because of collinearity
note: 25.Country1 omitted because of collinearity
note: 26.Country1 omitted because of collinearity
note: 27.Country1 omitted because of collinearity
note: 28.Country1 omitted because of collinearity
note: 29.Country1 omitted because of collinearity
note: 30.Country1 omitted because of collinearity
note: 31.Country1 omitted because of collinearity
note: 32.Country1 omitted because of collinearity
note: 33.Country1 omitted because of collinearity
note: 34.Country1 omitted because of collinearity
note: 35.Country1 omitted because of collinearity
note: 36.Country1 omitted because of collinearity
note: 37.Country1 omitted because of collinearity
note: 38.Country1 omitted because of collinearity
note: 39.Country1 omitted because of collinearity
note: 40.Country1 omitted because of collinearity
note: 41.Country1 omitted because of collinearity
note: 42.Country1 omitted because of collinearity
note: Year omitted because of collinearity

Fixed-effects (within) regression Number of obs = 696
Group variable: Country1 Number of groups = 42

R-sq: Obs per group:
within = 0.6703 min = 12
between = 0.2352 avg = 16.6
overall = 0.1419 max = 18

F(21,41) = .
corr(u_i, Xb) = -1.0000 Prob > F = .

(Std. Err. adjusted for 42 clusters in Country1)
---------------------------------------------------------------------------------
| Robust
UTA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
HC | -4.676685 2.180334 -2.14 0.038 -9.079959 -.2734105
BB1 | .0437888 .0317947 1.38 0.176 -.020422 .1079996
FD | .0007307 .0007847 0.93 0.357 -.0008541 .0023155
Pop| -.6092847 .4089752 -1.49 0.144 -1.435227 .2166576
|
Country1 |
2 | 0 (omitted)
3 | 0 (omitted)
4 | 0 (omitted)
5 | 0 (omitted)
6 | 0 (omitted)
7 | 0 (omitted)
8 | 0 (omitted)
9 | 0 (omitted)
10 | 0 (omitted)
11 | 0 (omitted)
12 | 0 (omitted)
13 | 0 (omitted)
14 | 0 (omitted)
15 | 0 (omitted)
16 | 0 (omitted)
17 | 0 (omitted)
18 | 0 (omitted)
19 | 0 (omitted)
20 | 0 (omitted)
21 | 0 (omitted)
22 | 0 (omitted)
23 | 0 (omitted)
24 | 0 (omitted)
25 | 0 (omitted)
26 | 0 (omitted)
27 | 0 (omitted)
28 | 0 (omitted)
29 | 0 (omitted)
30 | 0 (omitted)
31 | 0 (omitted)
32 | 0 (omitted)
33 | 0 (omitted)
34 | 0 (omitted)
35 | 0 (omitted)
36 | 0 (omitted)
37 | 0 (omitted)
38 | 0 (omitted)
39 | 0 (omitted)
40 | 0 (omitted)
41 | 0 (omitted)
42 | 0 (omitted)
|
Year |
2001 | -.1952207 .2280089 -0.86 0.397 -.655694 .2652526
2002 | -.1966589 .2317793 -0.85 0.401 -.6647467 .271429
2003 | .164012 .2776475 0.59 0.558 -.3967085 .7247326
2004 | .0180518 .3075936 0.06 0.953 -.6031461 .6392497
2005 | -.2736581 .3532516 -0.77 0.443 -.9870643 .439748
2006 | -.8547991 .3835659 -2.23 0.031 -1.629426 -.0801721
2007 | -1.283025 .4222257 -3.04 0.004 -2.135727 -.4303226
2008 | -1.634142 .3938418 -4.15 0.000 -2.429521 -.8387618
2009 | -.6105343 .3959356 -1.54 0.131 -1.410142 .1890739
2010 | -.5005376 .4033419 -1.24 0.222 -1.315103 .3140278
2011 | -.6435924 .3680321 -1.75 0.088 -1.386848 .0996636
2012 | -.3768544 .3704019 -1.02 0.315 -1.124896 .3711873
2013 | -.3017373 .3932904 -0.77 0.447 -1.096003 .4925289
2014 | -.5574258 .4027407 -1.38 0.174 -1.370777 .2559256
2015 | -1.013585 .4516883 -2.24 0.030 -1.925788 -.1013822
2016 | -1.526365 .5184349 -2.94 0.005 -2.573365 -.4793641
2017 | -2.127781 .5755326 -3.70 0.001 -3.290093 -.9654697
|
Year | 0 (omitted)
|
Country1#c.Year |
2 | .2385705 .0469372 5.08 0.000 .1437789 .3333621
3 | .1908602 .0326708 5.84 0.000 .1248802 .2568401
4 | .1162035 .0390429 2.98 0.005 .0373548 .1950523
5 | .2623986 .0533175 4.92 0.000 .1547218 .3700754
6 | .4052331 .0805052 5.03 0.000 .2426496 .5678167
7 | .7810907 .0494137 15.81 0.000 .6812976 .8808837
8 | .100915 .0280542 3.60 0.001 .0442584 .1575717
9 | .2521241 .0476898 5.29 0.000 .1558126 .3484357
10 | .1183214 .0586837 2.02 0.050 -.0001927 .2368354
11 | .2251814 .0549353 4.10 0.000 .1142373 .3361256
12 | .0751454 .062816 1.20 0.238 -.0517141 .2020049
13 | -.1629682 .0614669 -2.65 0.011 -.287103 -.0388334
14 | 1.225554 .0519606 23.59 0.000 1.120618 1.330491
15 | .0591233 .0516546 1.14 0.259 -.0451953 .1634419
16 | .270799 .0645895 4.19 0.000 .1403579 .4012401
17 | .3094958 .0594255 5.21 0.000 .1894836 .429508
18 | .3938101 .0539832 7.30 0.000 .2847889 .5028313
19 | -.0022528 .0756756 -0.03 0.976 -.1550829 .1505772
20 | .2295271 .0562484 4.08 0.000 .1159311 .3431231
21 | .1027779 .0481674 2.13 0.039 .0055018 .200054
22 | -.0065416 .0493304 -0.13 0.895 -.1061662 .0930831
23 | .3952148 .1137134 3.48 0.001 .1655659 .6248637
24 | .1020943 .0915495 1.12 0.271 -.0827937 .2869824
25 | .2058541 .0766968 2.68 0.010 .0509618 .3607464
26 | .4610483 .0490938 9.39 0.000 .3619015 .5601952
27 | .2286578 .039286 5.82 0.000 .149318 .3079976
28 | .1745403 .0295912 5.90 0.000 .1147796 .234301
29 | .1576823 .0561821 2.81 0.008 .0442201 .2711444
30 | -.7348902 .0418542 -17.56 0.000 -.8194164 -.650364
31 | -.0457858 .0571643 -0.80 0.428 -.1612314 .0696599
32 | .5076547 .0479479 10.59 0.000 .4108221 .6044874
33 | .2181217 .0577357 3.78 0.001 .1015221 .3347212
34 | .5318886 .1693027 3.14 0.003 .1899748 .8738024
35 | .3337262 .068911 4.84 0.000 .1945577 .4728947
36 | .4056563 .0473992 8.56 0.000 .3099318 .5013809
37 | .4238595 .0557037 7.61 0.000 .3113636 .5363554
38 | .2547221 .0531261 4.79 0.000 .1474318 .3620124
39 | .2385692 .0378104 6.31 0.000 .1622095 .3149288
40 | .1551168 .0597813 2.59 0.013 .034386 .2758476
41 | -.0898687 .1842088 -0.49 0.628 -.461886 .2821486
42 | .0503888 .0454926 1.11 0.274 -.0414854 .142263
|
_cons | -417.1883 92.74888 -4.50 0.000 -604.4984 -229.8781
----------------+----------------------------------------------------------------
sigma_u | 568.34893
sigma_e | 1.0597447
rho | .99999652 (fraction of variance due to u_i)
---------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#8

26 Nov 2020, 04:39

Originally posted by Carlo Lazzaro View Post

Himani:
normality is a (weak) requirement for residual distribution only.
That said:
- if you have -xtset- your data with -Country- as -panelid-, why including -i:Country- in the right hand-side of your regression equation?
- what's the meaning of including -Year- as both categorical and continuous regressor?
Eventually the previous recommendation (as per FAQ) to post what you typed and what Stata gave you back still applies Thanks.

Also, is residual distribution not affected by the distribution of the dependent variable in which case the distribution of the dependent variable becomes important.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17691
#9

26 Nov 2020, 04:52

Himany:
with 696 observations, residual normality is not an issue.
As far as your interaction is concerned, as you do not have the linear term for -County- (due to the collinearity with -panelid-), I do not think that it is informative.
That said, I would re-run the regression without -i.County- and with categorical -i.Year- and see what Stata gives you back.

Kind regards,
Carlo
(Stata 19.0)
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#10

26 Nov 2020, 05:14

Originally posted by Carlo Lazzaro View Post

Himany:
with 696 observations, residual normality is not an issue.
As far as your interaction is concerned, as you do not have the linear term for -County- (due to the collinearity with -panelid-), I do not think that it is informative.
That said, I would re-run the regression without -i.County- and with categorical -i.Year- and see what Stata gives you back.

Hi Carlo,

Thanks for your response. The results differ this time implying that the omitted variables might make a difference in time trends. Thanks you!

. egen Country1=group(Country)

. xtset Country1 Year,yearly
panel variable: Country1 (strongly balanced)
time variable: Year, 2000 to 2018
delta: 1 year

. gen BB1=BB/1000000
(27 missing values generated)

. xtreg UTA HC BB1 FD Pop i.Year ,fe vce(cluster Country1)

Fixed-effects (within) regression Number of obs = 696
Group variable: Country1 Number of groups = 42

R-sq: Obs per group:
within = 0.2171 min = 12
between = 0.1464 avg = 16.6
overall = 0.1700 max = 18

F(21,41) = 9.08
corr(u_i, Xb) = -0.0918 Prob > F = 0.0000

(Std. Err. adjusted for 42 clusters in Country1)
------------------------------------------------------------------------------
| Robust
UTA | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
HC | -1.255373 1.451126 -0.87 0.392 -4.185982 1.675235
BB1 | -.0141203 .0167748 -0.84 0.405 -.0479978 .0197571
FD | .0040062 .0043229 0.93 0.359 -.004724 .0127365
Pop | -1.128107 .5515876 -2.05 0.047 -2.24206 -.0141529
|
Year |
2001 | .0418953 .2798289 0.15 0.882 -.5232306 .6070212
2002 | .1500829 .2882968 0.52 0.605 -.4321443 .7323102
2003 | .6327896 .377685 1.68 0.101 -.1299608 1.39554
2004 | .7775548 .3484674 2.23 0.031 .0738106 1.481299
2005 | .6872495 .3464599 1.98 0.054 -.0124405 1.38694
2006 | .3309535 .3631662 0.91 0.367 -.4024756 1.064383
2007 | .128128 .4167739 0.31 0.760 -.7135639 .9698199
2008 | .0024826 .4049075 0.01 0.995 -.8152447 .82021
2009 | 1.113937 .3942821 2.83 0.007 .3176682 1.910206
2010 | 1.334925 .4331057 3.08 0.004 .4602505 2.2096
2011 | 1.329069 .520583 2.55 0.014 .2777306 2.380408
2012 | 1.803349 .5965465 3.02 0.004 .5985985 3.008099
2013 | 2.065362 .6834191 3.02 0.004 .6851691 3.445555
2014 | 2.006568 .674044 2.98 0.005 .6453085 3.367827
2015 | 1.722273 .6931554 2.48 0.017 .3224175 3.122129
2016 | 1.393074 .6905086 2.02 0.050 -.0014361 2.787585
2017 | 1.036102 .7078948 1.46 0.151 -.3935203 2.465725
|
_cons | 8.173861 4.492838 1.82 0.076 -.8996091 17.24733
-------------+----------------------------------------------------------------
sigma_u | 1.9689641
sigma_e | 1.5792198
rho | .60853379 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17691
#11

26 Nov 2020, 06:30

Himani:
I would try:

Code:

xtreg UTA HC BB1 FD Pop c.Year##c.Year,fe vce(cluster Country1)

This code explores whether a non-linear relationship exists between regressand and -Year- within each panel.

I would also explore in the postestimation session, whether your model is correctly specified.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#12

26 Nov 2020, 07:36

Originally posted by Carlo Lazzaro View Post

Himani:
I would try:

Code:

xtreg UTA HC BB1 FD Pop c.Year##c.Year,fe vce(cluster Country1)

This code explores whether a non-linear relationship exists between regressand and -Year- within each panel.

I would also explore in the postestimation session, whether your model is correctly specified.

Hi Carlo,

Thanks for your response. I just verified using your code that no non linear relationship exists between the regressand and Year within each panel.
For the postestimation session, what are the tests that are required to be performed? Is there something I could refer to for the process? Thanks a lot for all your help.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17691

#13

26 Nov 2020, 08:06

Himani:
after estimating your regression model, you should calculate -fitted- and -generate- -sq_fitted-.
Then you can run:
- an augmented regression;
- an auxiliary regression (-fitted- and -sq_fitted- as the only regressors):

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(24 missing values generated)

. g sq_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage c.age##c.age fitted sq_fitted , fe vce(cluster idcode)
note: c.age#c.age omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1105                                         min =          1
     between = 0.1029                                         avg =        6.1
     overall = 0.0882                                         max =         15

                                                F(3,4709)         =     355.44
corr(u_i, Xb)  = 0.0411                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0184474    .004408     4.19   0.000     .0098057    .0270891
             |
 c.age#c.age |          0  (omitted)
             |
      fitted |   6.920927   1.152074     6.01   0.000     4.662324    9.179531
   sq_fitted |  -2.079755   .4060541    -5.12   0.000    -2.875811   -1.283699
       _cons |  -4.586115   .8935105    -5.13   0.000    -6.337813   -2.834416
-------------+----------------------------------------------------------------
     sigma_u |  .40319282
     sigma_e |  .30215883
         rho |  .64035936   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

       F(  1,  4709) =   26.23
            Prob > F =    0.0000

. xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1092                                         min =          1
     between = 0.1033                                         avg =        6.1
     overall = 0.0881                                         max =         15

                                                F(2,4709)         =     523.09
corr(u_i, Xb)  = 0.0467                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fitted |   2.569185   .7085064     3.63   0.000     1.180181    3.958189
   sq_fitted |    -.47432   .2153021    -2.20   0.028    -.8964128   -.0522272
       _cons |  -1.290258    .580562    -2.22   0.026    -2.428431   -.1520844
-------------+----------------------------------------------------------------
     sigma_u |    .403403
     sigma_e |  .30238578
         rho |  .64025357   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

       F(  1,  4709) =    4.85
            Prob > F =    0.0276

.

No matter the approach, -test- outcome, reaching statistical significance, tells us the model is misspecified (as expected, since withi one predictor only it's hard to give a fair and true view of the data generating process under investigation).

Kind regards,
Carlo
(Stata 19.0)

Comment

Himani Srihan

Join Date: Apr 2020
Posts: 51

#14

26 Nov 2020, 16:53

Originally posted by Carlo Lazzaro View Post

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)

. xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710

R-sq: Obs per group:
within = 0.1087 min = 1
between = 0.1006 avg = 6.1
overall = 0.0865 max = 15

F(2,4709) = 507.42
corr(u_i, Xb) = 0.0440 Prob > F = 0.0000

(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0539076 .004307 12.52 0.000 .0454638 .0623515
|
c.age#c.age | -.0005973 .000072 -8.30 0.000 -.0007384 -.0004562
|
_cons | .639913 .0624195 10.25 0.000 .5175415 .7622845
-------------+----------------------------------------------------------------
sigma_u | .4039153
sigma_e | .30245467
rho | .64073314 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(24 missing values generated)

. g sq_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage c.age##c.age fitted sq_fitted , fe vce(cluster idcode)
note: c.age#c.age omitted because of collinearity

Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710

R-sq: Obs per group:
within = 0.1105 min = 1
between = 0.1029 avg = 6.1
overall = 0.0882 max = 15

F(3,4709) = 355.44
corr(u_i, Xb) = 0.0411 Prob > F = 0.0000

(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0184474 .004408 4.19 0.000 .0098057 .0270891
|
c.age#c.age | 0 (omitted)
|
fitted | 6.920927 1.152074 6.01 0.000 4.662324 9.179531
sq_fitted | -2.079755 .4060541 -5.12 0.000 -2.875811 -1.283699
_cons | -4.586115 .8935105 -5.13 0.000 -6.337813 -2.834416
-------------+----------------------------------------------------------------
sigma_u | .40319282
sigma_e | .30215883
rho | .64035936 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

( 1) sq_fitted = 0

F( 1, 4709) = 26.23
Prob > F = 0.0000

. xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)

Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710

R-sq: Obs per group:
within = 0.1092 min = 1
between = 0.1033 avg = 6.1
overall = 0.0881 max = 15

F(2,4709) = 523.09
corr(u_i, Xb) = 0.0467 Prob > F = 0.0000

(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fitted | 2.569185 .7085064 3.63 0.000 1.180181 3.958189
sq_fitted | -.47432 .2153021 -2.20 0.028 -.8964128 -.0522272
_cons | -1.290258 .580562 -2.22 0.026 -2.428431 -.1520844
-------------+----------------------------------------------------------------
sigma_u | .403403
sigma_e | .30238578
rho | .64025357 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

( 1) sq_fitted = 0

F( 1, 4709) = 4.85
Prob > F = 0.0276

.

Hi Carlo,

Thanks for your detailed response! One question I am facing about the code is why can't we in infer the statistical significance from the p value of the regression? Why do we need to use the test command after regression? Is the p value in the regression not enough? Thanks a lot for your help!

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17691
#15

27 Nov 2020, 01:21

Himani:
the test is intended to explore the functional form of the regressand (see -linktest- entry in Stata .pdf manual).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement