hausman test - Statalist

Alex Mueller

Join Date: May 2018

Posts: 51
#16

12 Jun 2018, 06:21

Carlo,

thanks for providing the code.
I tried it for my data but i am not sure about the interpretation:
i get a bar chart with the mode at 0 and some deviation (I try to upload a screenshot as attachment). Does that mean i suffer from heteroskedasticity?
In all visualisations of Heteroskedasticity I saw a scatterplot over residuals variance and the explanatory variable and not a density curve?

How can i interpret the e on the x axis? stata help says it is only "the value of stored result e(name)"
Attached Files

Last edited by Alex Mueller; 12 Jun 2018, 06:24.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#17

12 Jun 2018, 06:56

Alex:
as per visual inspection tyhe distribution of residual (e_it) looks a bit heteroskedastic.
Yes, scatter plot vs fitted values or preedictors in another approach.
If you have a quite large number of panel units (ie, clusters) you can use -robust- option and contrast the obtained standard errors vs. default ones.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#18

12 Jun 2018, 07:37

Carlo,

thanks for your help with the interpretation.

Could you please briefly explain, how you come to this conclusion?
Is it supposed to be normally distributed in the homoskedastic case? To me it looks like a slightly positive skewness (right tailed). Additionally there are some stronger positive outliers around 0.2.

Is that where you see the heteroskedasticity?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#19

12 Jun 2018, 08:05

Alex:
yes, the positive skewness makes me think about mild heteroskedasticity in your residual distribution..

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#20

12 Jun 2018, 08:20

Thank you, Carlo.

One last issue, I am concerned about is multicollinearity. As it is not possible to operate the VIF after xtreg, I found a post (https://www.stata.com/statalist/arch.../msg00018.html) stating that OLS with dummies gives identical estimates to "xtreg, fe"

Now I am wondering if this is applicable to my case as well.
My original model is :

Code:

xtreg performance promoted_in_observation_period c.Age##c.Age work_exp former_achievments job_quits number_former_employers job_rotation i.year graduate sex medium_size_enterprise , fe robust

So i tried to create dummies for the OLS:

Code:

reg c.performance i.promoted_in_observation_period c.Age##c.Age c.work_exp c.former_achievments c.job_quits c.number_former_employers i.job_rotation i.year c.graduate i.sex i.medium_size_enterprise, robust

This model delivers a completely different output though (other p-values and coefficients)
Am I doing it wrong? is it the right approach to create dummies for binary variables (sex, etc) by i.variable and c.variable for non-binary variables?

Many thanks!
A.
Comment
Hassen Ali

Join Date: May 2018

Posts: 39
#21

12 Jun 2018, 08:39

Thank you very much, Carlo!! I have learned a lot from your daily posts.
With Best Wishes,Hassen
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#22

12 Jun 2018, 08:52

I can only endorse Hassen strongly! Thank you very much, Carlo!

Last edited by Alex Mueller; 12 Jun 2018, 08:55.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#23

12 Jun 2018, 09:41

Alex:
don't be too worried about multicollinearity (see Clyde Schechter' s posts on this topic).
About using -regress- to then run -estat vif- what your code omits is the cluster standard errors, as your observations are not independent due to the panel structure of your data. In fact, -robust- (that accounts for heteroskedasticity) does a different job than -cluster- (that accounts for autocorrelation) under -regress-. Conversely, both options do the same job under -xtreg- (and this can be puzzling the first times you switch between them).

Kind regards,
Carlo
(Stata 19.0)
Comment

Alex Mueller

Join Date: May 2018
Posts: 51

#24

12 Jun 2018, 10:14

Carlo,

thanks for that hint.

I see that it is not too much to worry about but i still want to capture it in my model to be on the "safe side"

Is the approach

Code:

 
 reg c.performance i.promoted_in_observation_period c.Age##c.Age c.work_exp c.former_achievments c.job_quits c.number_former_employers i.job_rotation i.year c.graduate i.sex i.medium_size_enterprise, robust

especially the usage of c.variable and i.binaryvariable correct? and how can i incorporate the clusteroption in the code?

Code:

 reg c.performance i.promoted_in_observation_period c.Age##c.Age c.work_exp c.former_achievments  c.job_quits c.number_former_employers i.job_rotation i.year c.graduate i.sex i.medium_size_enterprise, robust cluster generate

dont delivers an output.

Best, A.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#25

12 Jun 2018, 10:25

Alex:
- the use of factor variable notation looks OK;
- as far as your last code is concerned, Stata should have issued a warning message, such as:
[CODE]option cluster incorrectly specified[/CODE]

The fix is to get rid of -robust- (it's redundant) and rewrite the -cluster- option as follows:

Code:

cluster(panelid)

The you can run -estat vif-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#26

13 Jun 2018, 01:22

Carlo,

many thanks for that clarification!
You are right, that is exactly the issued warning message.

1) Am I right that - , robust – does the job of controlling for heteroscedasticity, autocorrelation as well as clustering std. errors in - xtreg , fe – whereas - , cluster (panelid) – does it under –reg-?

2) I am not completely sure about the (panelid) variable. My panel is specified as follows:

Code:

xtset worker_id year, yearly

where worker_id assigns a unique number to each worker (which of course can show up only once per period but several times in different periods.

- Is worker_id in this case my panelid? implying:

Code:

reg, c.performance i.promoted_in_observation_period (…) i.year, cluster (worker_id)

?

- Or do i need to generate a panelid variable first? And how would this variable look like in this case?

Many thanks and best wishes!
A.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#27

13 Jun 2018, 01:52

Alex:
1) under -regress-, -robust- accounts for heteroskedasticity, whereas -cluster- accounts for autocorrelation;
2) under -xtreg-, -robust- and -cluster- do the very same job, that is they account for both heteroskedasticity and/or autocorrelation;
3) under both -regress- and -xtreg- clustered standard errors need a cluster identifier (-worker_id- in your case). Hence, you do not have to generate any new panelid.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Alex Mueller

Join Date: May 2018

Posts: 51
#28

13 Jun 2018, 02:10

Carlo, thanks again for your helpful advice!

1+2) I see that point. However i dont get why i dont need robust AND cluster (panelid) in the -reg- case? I.e. why you see it as redundant:

Originally posted by Carlo Lazzaro View Post

The fix is to get rid of -robust- (it's redundant) and rewrite the -cluster- option as follows:

Code:

cluster(panelid)

3) Thanks for the clarificaion!
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

#29

13 Jun 2018, 02:19

Alex:
can you detect any difference in the following toy-example?

Code:

sysuse auto.dta
regress price i.foreign mpg, robust cluster(rep78)

Linear regression                               Number of obs     =         69
                                                F(2, 4)           =      13.78
                                                Prob > F          =     0.0161
                                                R-squared         =     0.2531
                                                Root MSE          =     2554.8

                                  (Std. Err. adjusted for 5 clusters in rep78)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |   1500.085   307.6182     4.88   0.008     645.9996     2354.17
         mpg |  -280.1494   90.41268    -3.10   0.036    -531.1753    -29.1236
       _cons |   11653.84   1852.516     6.29   0.003     6510.428    16797.25
------------------------------------------------------------------------------

regress price i.foreign mpg, cluster(rep78)

Linear regression                               Number of obs     =         69
                                                F(2, 4)           =      13.78
                                                Prob > F          =     0.0161
                                                R-squared         =     0.2531
                                                Root MSE          =     2554.8

                                  (Std. Err. adjusted for 5 clusters in rep78)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     foreign |
    Foreign  |   1500.085   307.6182     4.88   0.008     645.9996     2354.17
         mpg |  -280.1494   90.41268    -3.10   0.036    -531.1753    -29.1236
       _cons |   11653.84   1852.516     6.29   0.003     6510.428    16797.25
------------------------------------------------------------------------------

Kind regards,
Carlo
(Stata 19.0)

Comment

Alex Mueller

Join Date: May 2018

Posts: 51
#30

13 Jun 2018, 02:40

Carlo,

thanks for providing the example.

i)I can spot no difference. This does not surprise me as you already stated it is redundant J
Nevertheless I am not really getting WHY it is redundant as I understood they do different things under reg.
But as long as it is working it is fine for me!

ii)I tried It again and compared

Code:

xtreg performance promoted_in_observation_period (…controlvariables…) i.year , fe robust

with

Code:

reg c.performamce i. promoted_in_observation_period (i./c.controlvariables) i.year , cluster (worker_id)

That still does not deliver the same output.

Is this maybe because of the i./c. clarification? For instance I use c.Age although it is not really continuous. Would i.Age be more accurate? But this would create about 50 dummies.

iii) Despite the different output I tried the –estat vif – command. They all look okay (between 1 and 6) apart from Age and Age_squared: They are both above 100.
It somehow makes sense that there is a high collinearity between Age and Age_sq, but is there a possibility to compare age with the other regressors? E.G by only taking into account one of them (e.g only Age or only Age_squared?)

Best, A.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment