Influence statistics in Stata manual for xtreg postestimation

Alex Mai

Join Date: May 2016

Posts: 213
#1

Influence statistics in Stata manual for xtreg postestimation

18 Sep 2018, 12:10

Dear Statalists,

The Stata manual for xtreg postestimation provides a description for predict as follows:

predictions, residuals, influence statistics, and other diagnostic measures

It mentions "influence statistics". However, I do not find information relevant to influence statistics in the manual for xtreg postestimation. Predict has some syntaxes for fitted values and residuals, but I do not think that they can be used for influence statistics due to the absence of information on leverage and standardized residual. So may I ask what the "influence statistics" in the manual refers to?

After reading lots of materials, I have not found statistical methods for detecting influential points in panel regressions. Cook's Distance and DIFFITS are designed for cross-sectional regressions, and the lack of agreed definition on standardized residuals for panel data further complicates this problem.

So can I say that there are no good approaches to detect influential points in panel regressions?

Many thanks!
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

18 Sep 2018, 13:45

So may I ask what the "influence statistics" in the manual refers to?

It appears to refer to the author of xtreg postestimation documentation copying and pasting that material from reg postestimation and overlooking the fact that predict after xtreg does not in fact provide any influence statistics.

I can't address your substantive question, however, although I would expect that if there were appropriate influence measures for xtreg, Stata would provide them. I note that the output of search influence finds no hits within the [xt] help files.
1 like
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#3

18 Sep 2018, 14:14

Originally posted by William Lisowski View Post

It appears to refer to the author of xtreg postestimation documentation copying and pasting that material from reg postestimation and overlooking the fact that predict after xtreg does not in fact provide any influence statistics.
I can't address your substantive question, however, although I would expect that if there were appropriate influence measures for xtreg, Stata would provide them. I note that the output of search influence finds no hits within the [xt] help files.

Dear William,

Thank you very much! I have checked the manual for reg postestimation, and it indeed has the same description of predict as that in the manual for xtreg postestimation. It seems to be just a copying and pasting.
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#4

19 Sep 2018, 13:26

Originally posted by William Lisowski View Post

I can't address your substantive question, however, although I would expect that if there were appropriate influence measures for xtreg, Stata would provide them. I note that the output of search influence finds no hits within the [xt] help files.

Dear William,

Sorry for disturbing again, but may I ask if there is any method to obtain influence statistics (e.g. cook's D and DFITS) for reg after using robust estimation (vce(robust))?
reg postestimation tools do not work after vce(robust).
I know that robust estimation does not change coefficient and residuals, but only standard errors and confidence intervals. However, I do not know if robust estimation will change observations' leverage.

Thank you very much.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

19 Sep 2018, 19:19

I'm no expert on this. You might improve your chances of getting a response by reposting the question in #4 as a new topic with a title that might attract those with more expertise, perhaps something like "why are influence statistics not available with vce(robust)?". That is, I'd assume there's no happy answer, but an explanation would at least let you know why in case you're challenged on it.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#6

19 Sep 2018, 19:56

I'm not an expert in this area either, but I know a little about leverage. The concept behind it is to identify the extent to which any observation, if omitted from the analysis, would materially change the regression coefficients. Moreover, the PDF documentation makes clear that the vector of leverage statistics (one dimension for each observation) is calculated as x_j(X'X)^-1x_j^', an expression which does not at all involve the variance-covariance matrix estimator. So I'm at a loss to understand why Stata does not permit you to calculate leverage after -regress, vce(robust)-. Usually when Stata prohibits something, it's something that should never be done. But it appears to me that this is not the case here.

I might be missing something here, but if I wanted to do a leverage calculation after -regress, vce(robust)-, I would just go ahead and re-estimate the regression without -vce(robust)- and then use -predict, leverage-. And I'd feel comfortable relying on those results.

If somebody out there can point out where my reasoning is wrong, please do!

This, of course, is a separate issue from calculating leverage after -xtreg-. Here I see a potential problem because it is unclear at what level of the data hierarchy the concept applies. Are there whole panels that are potentially unduly influential on the results? Or are we interested even in single observations within panels that might exert large influence? And if the latter, does it matter whether they are overly influential compared to others within the same panel, or is it only important to know if they are influential relative to the entire estimation sample. It seems to me that the whole concept becomes fuzzy when we move to panel data (and probably even more so if we go to 3 or more level models. Again, I'd be happy for somebody to contradict me on this and clarify how it is done.

Added: Of course, if you buy the concept of leverage in a panel regression, or aren't worried about the issues I'm raising and just want to calculate it, you can always emulate the -xtreg, fe- by just running -regress- and including i.panel_variable among the regressors. Leave out the -vce(robust)- and run -predict, leverage-. Interpret the results at your own risk, I suppose. But it can be done.
2 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

20 Sep 2018, 06:53

I have a hypothesis that the link between (X'X)^-1 and the variance-covariance matrix estimator is that although regress does include (X'X)^-1 among the ereturn results, (X'X)^-1 can be reconstructed from e(V) for vce(ols), but not when e(V) was calculated using other vce() options. If that is indeed the case, the lack of a leverage calculation is a conseuquence of unavailability of the necessary input, not a prohibition on statistical grounds. And this would seem to support Clyde's recommendation in the second paragraph of post #6.

I agree with everything Clyde writes in the fourth paragraph of post #6 regarding leverage in panel data.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#8

20 Sep 2018, 09:27

I have a hypothesis that the link between (X'X)^-1 and the variance-covariance matrix estimator is that although regress does include (X'X)^-1 among the ereturn results, (X'X)^-1 can be reconstructed from e(V) for vce(ols), but not when e(V) was calculated using other vce() options.

Good point.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

20 Sep 2018, 09:38

Wow, the fact that I posted #7 before my first coffee of the day is really obvious in the typos. I gotta start waiting to post. What Clyde imputed from my typing was

I have a hypothesis that the link between (X'X)^-1 and the variance-covariance matrix estimator is that although regress does not include (X'X)^-1 among the ereturn results, (X'X)^-1 can be reconstructed from e(V) for vce(ols), but not when e(V) was calculated using other vce() options. If that is indeed the case, the lack of a leverage calculation is a consequence of unavailability of the necessary input, not a prohibition on statistical grounds.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#10

20 Sep 2018, 11:50

Well, I don't have the excuse about first coffee of the day. A, I'm a tea drinker, and B, I'd had two cups by the time I posted #8. But maybe it's because I was up really late last night. Anyway, when I read William's post in #7, I "saw" the not that he now correctly points out wasn't actually there! I understood him as he meant it, contrary to how it was written. Weird!
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#11

20 Sep 2018, 16:04

Note that the influence statistics for regress estimated by rstudent, cooksd, dfits, welsch, and dfbetas all require an estimate of the residual standrd deviation, which is a constant in the theory of OLS. However the point of using vce(robust) is that such a constant is not assumed.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#12

20 Sep 2018, 16:39

Agree with #11. That's why my remarks and suggested workaround were limited to -leverage-, which is not contingent on that assumption.
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#13

22 Sep 2018, 13:51

Originally posted by Clyde Schechter View Post

Added: Of course, if you buy the concept of leverage in a panel regression, or aren't worried about the issues I'm raising and just want to calculate it, you can always emulate the -xtreg, fe- by just running -regress- and including i.panel_variable among the regressors. Leave out the -vce(robust)- and run -predict, leverage-. Interpret the results at your own risk, I suppose. But it can be done.

Dear Clyde,

Thank you so much once more for the suggestion! I follow your idea and do the influence statistics by using -regress- with panel variables. The influential points derived from this measure are more or less those that appear "peculiar" on the residual plots. So I think this method should be fine.
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#14

22 Sep 2018, 13:57

Originally posted by William Lisowski View Post

I'm no expert on this. You might improve your chances of getting a response by reposting the question in #4 as a new topic with a title that might attract those with more expertise, perhaps something like "why are influence statistics not available with vce(robust)?". That is, I'd assume there's no happy answer, but an explanation would at least let you know why in case you're challenged on it.

Dear William,

Thank you so much once more! I think Steve at #11 has mentioned an explanation for the unavailability of influence statistics after vce(robust)

Note that the influence statistics for regress estimated by rstudent, cooksd, dfits, welsch, and dfbetas all require an estimate of the residual standrd deviation, which is a constant in the theory of OLS. However the point of using vce(robust) is that such a constant is not assumed.
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#15

22 Sep 2018, 13:58

Originally posted by Steve Samuels View Post

Note that the influence statistics for regress estimated by rstudent, cooksd, dfits, welsch, and dfbetas all require an estimate of the residual standrd deviation, which is a constant in the theory of OLS. However the point of using vce(robust) is that such a constant is not assumed.

Dear Steve,

Thank you very much!
Comment

Announcement

Influence statistics in Stata manual for xtreg postestimation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment