Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Influence statistics in Stata manual for xtreg postestimation

    Dear Statalists,

    The Stata manual for xtreg postestimation provides a description for predict as follows:

    predictions, residuals, influence statistics, and other diagnostic measures
    It mentions "influence statistics". However, I do not find information relevant to influence statistics in the manual for xtreg postestimation. Predict has some syntaxes for fitted values and residuals, but I do not think that they can be used for influence statistics due to the absence of information on leverage and standardized residual. So may I ask what the "influence statistics" in the manual refers to?

    After reading lots of materials, I have not found statistical methods for detecting influential points in panel regressions. Cook's Distance and DIFFITS are designed for cross-sectional regressions, and the lack of agreed definition on standardized residuals for panel data further complicates this problem.

    So can I say that there are no good approaches to detect influential points in panel regressions?

    Many thanks!

  • #2
    So may I ask what the "influence statistics" in the manual refers to?
    It appears to refer to the author of xtreg postestimation documentation copying and pasting that material from reg postestimation and overlooking the fact that predict after xtreg does not in fact provide any influence statistics.

    I can't address your substantive question, however, although I would expect that if there were appropriate influence measures for xtreg, Stata would provide them. I note that the output of search influence finds no hits within the [xt] help files.

    Comment


    • #3
      Originally posted by William Lisowski View Post

      It appears to refer to the author of xtreg postestimation documentation copying and pasting that material from reg postestimation and overlooking the fact that predict after xtreg does not in fact provide any influence statistics.
      I can't address your substantive question, however, although I would expect that if there were appropriate influence measures for xtreg, Stata would provide them. I note that the output of search influence finds no hits within the [xt] help files.
      Dear William,

      Thank you very much! I have checked the manual for reg postestimation, and it indeed has the same description of predict as that in the manual for xtreg postestimation. It seems to be just a copying and pasting.

      Comment


      • #4
        Originally posted by William Lisowski View Post
        I can't address your substantive question, however, although I would expect that if there were appropriate influence measures for xtreg, Stata would provide them. I note that the output of search influence finds no hits within the [xt] help files.
        Dear William,

        Sorry for disturbing again, but may I ask if there is any method to obtain influence statistics (e.g. cook's D and DFITS) for reg after using robust estimation (vce(robust))?
        reg postestimation tools do not work after vce(robust).
        I know that robust estimation does not change coefficient and residuals, but only standard errors and confidence intervals. However, I do not know if robust estimation will change observations' leverage.

        Thank you very much.

        Comment


        • #5
          I'm no expert on this. You might improve your chances of getting a response by reposting the question in #4 as a new topic with a title that might attract those with more expertise, perhaps something like "why are influence statistics not available with vce(robust)?". That is, I'd assume there's no happy answer, but an explanation would at least let you know why in case you're challenged on it.

          Comment


          • #6
            I'm not an expert in this area either, but I know a little about leverage. The concept behind it is to identify the extent to which any observation, if omitted from the analysis, would materially change the regression coefficients. Moreover, the PDF documentation makes clear that the vector of leverage statistics (one dimension for each observation) is calculated as xj(X'X)-1xj', an expression which does not at all involve the variance-covariance matrix estimator. So I'm at a loss to understand why Stata does not permit you to calculate leverage after -regress, vce(robust)-. Usually when Stata prohibits something, it's something that should never be done. But it appears to me that this is not the case here.

            I might be missing something here, but if I wanted to do a leverage calculation after -regress, vce(robust)-, I would just go ahead and re-estimate the regression without -vce(robust)- and then use -predict, leverage-. And I'd feel comfortable relying on those results.

            If somebody out there can point out where my reasoning is wrong, please do!

            This, of course, is a separate issue from calculating leverage after -xtreg-. Here I see a potential problem because it is unclear at what level of the data hierarchy the concept applies. Are there whole panels that are potentially unduly influential on the results? Or are we interested even in single observations within panels that might exert large influence? And if the latter, does it matter whether they are overly influential compared to others within the same panel, or is it only important to know if they are influential relative to the entire estimation sample. It seems to me that the whole concept becomes fuzzy when we move to panel data (and probably even more so if we go to 3 or more level models. Again, I'd be happy for somebody to contradict me on this and clarify how it is done.

            Added: Of course, if you buy the concept of leverage in a panel regression, or aren't worried about the issues I'm raising and just want to calculate it, you can always emulate the -xtreg, fe- by just running -regress- and including i.panel_variable among the regressors. Leave out the -vce(robust)- and run -predict, leverage-. Interpret the results at your own risk, I suppose. But it can be done.

            Comment


            • #7
              I have a hypothesis that the link between (X'X)-1 and the variance-covariance matrix estimator is that although regress does include (X'X)-1 among the ereturn results, (X'X)-1 can be reconstructed from e(V) for vce(ols), but not when e(V) was calculated using other vce() options. If that is indeed the case, the lack of a leverage calculation is a conseuquence of unavailability of the necessary input, not a prohibition on statistical grounds. And this would seem to support Clyde's recommendation in the second paragraph of post #6.

              I agree with everything Clyde writes in the fourth paragraph of post #6 regarding leverage in panel data.

              Comment


              • #8
                I have a hypothesis that the link between (X'X)-1 and the variance-covariance matrix estimator is that although regress does include (X'X)-1 among the ereturn results, (X'X)-1 can be reconstructed from e(V) for vce(ols), but not when e(V) was calculated using other vce() options.
                Good point.

                Comment


                • #9
                  Wow, the fact that I posted #7 before my first coffee of the day is really obvious in the typos. I gotta start waiting to post. What Clyde imputed from my typing was

                  I have a hypothesis that the link between (X'X)-1 and the variance-covariance matrix estimator is that although regress does not include (X'X)-1 among the ereturn results, (X'X)-1 can be reconstructed from e(V) for vce(ols), but not when e(V) was calculated using other vce() options. If that is indeed the case, the lack of a leverage calculation is a consequence of unavailability of the necessary input, not a prohibition on statistical grounds.

                  Comment


                  • #10
                    Well, I don't have the excuse about first coffee of the day. A, I'm a tea drinker, and B, I'd had two cups by the time I posted #8. But maybe it's because I was up really late last night. Anyway, when I read William's post in #7, I "saw" the not that he now correctly points out wasn't actually there! I understood him as he meant it, contrary to how it was written. Weird!

                    Comment


                    • #11
                      Note that the influence statistics for regress estimated by rstudent, cooksd, dfits, welsch, and dfbetas all require an estimate of the residual standrd deviation, which is a constant in the theory of OLS. However the point of using vce(robust) is that such a constant is not assumed.
                      Steve Samuels
                      Statistical Consulting
                      [email protected]

                      Stata 14.2

                      Comment


                      • #12
                        Agree with #11. That's why my remarks and suggested workaround were limited to -leverage-, which is not contingent on that assumption.

                        Comment


                        • #13
                          Originally posted by Clyde Schechter View Post
                          Added: Of course, if you buy the concept of leverage in a panel regression, or aren't worried about the issues I'm raising and just want to calculate it, you can always emulate the -xtreg, fe- by just running -regress- and including i.panel_variable among the regressors. Leave out the -vce(robust)- and run -predict, leverage-. Interpret the results at your own risk, I suppose. But it can be done.
                          Dear Clyde,

                          Thank you so much once more for the suggestion! I follow your idea and do the influence statistics by using -regress- with panel variables. The influential points derived from this measure are more or less those that appear "peculiar" on the residual plots. So I think this method should be fine.

                          Comment


                          • #14
                            Originally posted by William Lisowski View Post
                            I'm no expert on this. You might improve your chances of getting a response by reposting the question in #4 as a new topic with a title that might attract those with more expertise, perhaps something like "why are influence statistics not available with vce(robust)?". That is, I'd assume there's no happy answer, but an explanation would at least let you know why in case you're challenged on it.
                            Dear William,

                            Thank you so much once more! I think Steve at #11 has mentioned an explanation for the unavailability of influence statistics after vce(robust)
                            Note that the influence statistics for regress estimated by rstudent, cooksd, dfits, welsch, and dfbetas all require an estimate of the residual standrd deviation, which is a constant in the theory of OLS. However the point of using vce(robust) is that such a constant is not assumed.

                            Comment


                            • #15
                              Originally posted by Steve Samuels View Post
                              Note that the influence statistics for regress estimated by rstudent, cooksd, dfits, welsch, and dfbetas all require an estimate of the residual standrd deviation, which is a constant in the theory of OLS. However the point of using vce(robust) is that such a constant is not assumed.
                              Dear Steve,

                              Thank you very much!

                              Comment

                              Working...
                              X