Difference between reg, reg with vce(robust) and reg with vce(cluster)

HeeSung Kim

Join Date: Nov 2016

Posts: 41
#1

Difference between reg, reg with vce(robust) and reg with vce(cluster)

13 Dec 2016, 13:08

Dear Statalisters,

While running regressions in Stata I've encountered a somewhat strange and counter-intuitive result in my regression. I ran the same regression 3 times, one without any options, one with vce(robust) and another with vce(cluster).

My intuition from the regression was that one without any options will have the lowest standard error, followed by robust option and cluster option. However, I discovered that regression with robust and cluster option had smaller standard error compared to the regression without any option. In fact, the standard error from robust option and cluster option was identical.

What is the possible explanation for this result? If it helps, the code that I ran was:

Code:

xtset industry year xi: xtreg yvar xvar i.year i.industry xi: xtreg yvar xvar i.year i.industry, vce(robust) xi: xtreg yvar xvar i.year i.industry, vce(cluster industry)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

13 Dec 2016, 13:28

The manual documentation for -xtreg- clarifies that for this command, -vce(robust)- is implemented as -vce (cluster panelvar)-. (Note to StataCorp: this is not clear in the help file.) So the fact that you got the same results with the second and third is not at all surprising. You cannot get an unclustered -xtreg, vce(robust)- in Stata: it is not implemented, because it is not a valid vce estimator.

As for the expectation that the standard errors will be lower with the non-robust vce estimator, that is often, perhaps usually the case. But it is not invariably the case, as you have discovered. One question that is important here is how many industries you have in your data. If the number of cluster is small, vce(cluster) is not an improvement over the non-robust vce, and in fact it can be substantially worse. Experts disagree about just how small is small for these purposes, but it might be a consideration here. Certainly if you have, say just a dozen or so industries, most would agree that the cluster-robust vce should not be used here.

Finally, a couple of asides. You should seldom, if ever, use the xi: prefix any more. The automatic generation of indicator variables is taken care of by factor variable notation now. There are still a few situations where factor variable notation does not work, but they are either esoteric or associated with archaic commands whose function has been replaced by more modern commands that do support factor-variable notation. Certainly there is no reason to every use -xi:- with -xtreg-. Does it matter? Yes. If you use -xi:-, you will be unable to make use of the -margins- command following the estimation. The -margins- command is, in my experience, one of the top ten most useful commands in all of Stata. So I recommend you abandon your use of -xi:-, and even try to forget you ever knew about it.

Another aside: after you have -xtset industry year-, it does not make sense to include industry indicators among the variables in your random effects model. You are, in effect, specifying both fixed and random effects in the same model. It makes no sense conceptually, and I am confident that the variance component estimates you get from -xtreg- with that are meaningless and uninterpretable.

Finally, the title of your post is inaccurate: you didn't use -reg-, you used -xtreg-, which is a different command.
2 likes
Comment
Priyesh VP

Join Date: Jul 2016

Posts: 38
#3

05 Feb 2018, 03:03

Hi,

I ran the following regression commands in Sata. I hypothesis a relationship between my main independent variable (dr) and dta. However I get significant association only when I don't add vce command. Its a cross sectional regression. I control for year since firms data are taken from different periods. A firm will have only one year data, but the year may be different for different firms.

Can anyone help if it is advisable to use vce in this scenario??

HTML Code:

regress dta dr agel roa_w lev_w size_w mb_w salesgr_w i.nic2 i.year regress dta dr agel roa_w lev_w size_w mb_w salesgr_w i.nic2 i.year, vce(robust) regress dta dr agel roa_w lev_w size_w mb_w salesgr_w i.nic2 i.year, vce(cluster year)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

05 Feb 2018, 04:05

Priyesh:
under -regress-, -vce(robust)- accounts for hetreoskedasticity in residual distribution, whereas -vce(cluster)- accounts for residual autocorrelation.
Hence, you should take a look at your dataset first and then decide which way to go with your standard errors (default; robust; cluster): there's no hard and fast rule (as, unfortunately, it's often the case in life).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Aye Aye Khaine

Join Date: Jan 2019

Posts: 41
#5

03 Apr 2019, 15:57

Thank you, HeeSung, Priyesh for their questions and Clyde and Carlo for their response. I was actually looking for these
Comment
Marian Vasile

Join Date: Dec 2015

Posts: 2
#6

26 Sep 2019, 04:44

Can you please pinpoint a paper naming the number of necessary clusters so that one safely use vce(cluster) option? Thank you!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

26 Sep 2019, 07:29

Marian:
welcome to this forum.
You may want to take a look at http://cameron.econ.ucdavis.edu/rese...5_February.pdf.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marian Vasile

Join Date: Dec 2015

Posts: 2
#8

07 Oct 2019, 06:42

Thank you Carlo!
Comment
Nadhira Kamila

Join Date: May 2021

Posts: 5
#9

05 May 2021, 11:13

Originally posted by Clyde Schechter View Post

The manual documentation for -xtreg- clarifies that for this command, -vce(robust)- is implemented as -vce (cluster panelvar)-. (Note to StataCorp: this is not clear in the help file.) So the fact that you got the same results with the second and third is not at all surprising. You cannot get an unclustered -xtreg, vce(robust)- in Stata: it is not implemented, because it is not a valid vce estimator.

As for the expectation that the standard errors will be lower with the non-robust vce estimator, that is often, perhaps usually the case. But it is not invariably the case, as you have discovered. One question that is important here is how many industries you have in your data. If the number of cluster is small, vce(cluster) is not an improvement over the non-robust vce, and in fact it can be substantially worse. Experts disagree about just how small is small for these purposes, but it might be a consideration here. Certainly if you have, say just a dozen or so industries, most would agree that the cluster-robust vce should not be used here.

Finally, a couple of asides. You should seldom, if ever, use the xi: prefix any more. The automatic generation of indicator variables is taken care of by factor variable notation now. There are still a few situations where factor variable notation does not work, but they are either esoteric or associated with archaic commands whose function has been replaced by more modern commands that do support factor-variable notation. Certainly there is no reason to every use -xi:- with -xtreg-. Does it matter? Yes. If you use -xi:-, you will be unable to make use of the -margins- command following the estimation. The -margins- command is, in my experience, one of the top ten most useful commands in all of Stata. So I recommend you abandon your use of -xi:-, and even try to forget you ever knew about it.

Another aside: after you have -xtset industry year-, it does not make sense to include industry indicators among the variables in your random effects model. You are, in effect, specifying both fixed and random effects in the same model. It makes no sense conceptually, and I am confident that the variance component estimates you get from -xtreg- with that are meaningless and uninterpretable.

Finally, the title of your post is inaccurate: you didn't use -reg-, you used -xtreg-, which is a different command.

---

Dear Mr Clyde,

Regarding your statement where you say it doesn't make sense to include i.industry in RE, then how should I compare the two models for Hausman Test? Should I insert the i.industry in FE only? I did Hausman including i.industry in both estimates and the result is 0.9976. Is that a correct result?

Another thing, can I use -reg, var i.industry i.year vce(robust)- if the RE and FE doesn't work?

Thank you so much, I look forward for your answer.

Regards, Nadhira
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#10

05 May 2021, 11:27

You do not describe your models, so it is not possible to give specific advice. If industry is your panel variable in the -xtset- command, then you should not include i.industry in either the -fe- or the -re- model: just specifying -fe- and -re- themselves will cause Stata to include industry in the way that is appropriate for those analyses. A model that has been -xtset- with industry at the panel variable should not include i.industry. If you do that with -fe- it does no harm-- Stata will just omit it anyway--but it also does no good. But if you do it with -re- you get useless results. So just don't do it for either.
Comment

Nadhira Kamila

Join Date: May 2021
Posts: 5

#11

05 May 2021, 11:48

Dear Mr Clyde,

Thank you for your reply, Yes, you are correct, but my panel variable is Company and I need to group them into their respective industries, so I create a dummy industry. If so, should I still exclude i.industry?

Another thing that I would like to ask is I have removed i.industry from both estimates and this is the result I got from Hausman. Is the value 0.999 normal and can be interpretable or did I do anything wrong in the process?

Code:

 
b = consistent under Ho and Ha;
obtained
from
xtreg

B =
inconsistent under Ha, efficient under Ho;
obtained
from
xtreg

Test:
Ho:
difference in coefficients not systematic


chi2(14) = (b-B)'[(V_b-V_B)^(-1)](b-B)


= 1.88


Prob>chi2 = 0.9999

Thankyou once again, sir!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#12

05 May 2021, 13:07

Why do you want to include i.industry in your model? If you just want to adjust for its effects on your outcome variable, then with company as the panel variable, there is no need to mention i.industry in a fixed effects model. The adjustment for industry will be automatic, because industry effects are encompassed within the company fixed effects. In a random effects model, you would need to explicitly include i.industry to accomplish this. (Or, if you have a large number of industries, you might want to go to a 3-level model with industry as the top level.)

If on the other hand your goal is to actually estimate the industry effects, that simply cannot be done in a fixed effects model. Your options then would be to go to a random effects model, or to use a hybrid model (-xthybrid-, available from SSC) which will separately estimate the within- and between- effects of all the predictors in your model.
1 like
Comment

Announcement