Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about -xb- and -xbu- after -predict- in panel regression

    Dear Statalist,

    I have a question about the difference between various options (i.e. xb, xbu, ue, u, and e) following predict in panel regressions. I first run a random-effects regression by xtreg, re robust, and may I ask if my following understandings are correct?

    1. If I use xb, then I obtain fitted values that are exclusively explained by the independent variables (the model per se).
    2. If I use xbu, then I obtain fitted values that combine the part that is explained by the independent variables and the part that is explained by the time-invariant individual effects.
    3. If I use u, then I obtain the residuals that are explained by the time-invariant individual effects.
    4. If I use e, then I obtain residuals that are exclusively due to stochastic errors or disturbance.
    5. If I use ue, then I obtain the composite/overall residuals (individual effects+stochastic errors) that cannot be explained by the independent variables (the model per se).

    Are these interpretations correct?

    Thank you very much!

  • #2
    I would use different terminology to describe the different components of the model (because, for example, the residuals associated with the individual-level effects are also stochastic and are also errors.) But within the terminologic framework you are using, yes,your interpretations are correct.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I would use different terminology to describe the different components of the model (because, for example, the residuals associated with the individual-level effects are also stochastic and are also errors.) But within the terminologic framework you are using, yes,your interpretations are correct.
      Dear Clyde,

      Thank you very much for the correction! May I ask one more thing? For residual plot with respect to a panel regression (in order to check the validity of the model), shall I plot ue (the overall stochastic portion of the model, both the individual effects u and the idiosyncratic errors e) against xb (the deterministic portion of the model), rather than plotting e against xbu?

      Comment


      • #4
        If you are interested in the validity of the model, I would do neither of these. I would plot observed outcome against xbu (the predicted value for each observation) If you are interested in its predictive validity, then you need to plot observed outcome against xb. If you are interested in things like heteroscedasticity of errors, I would plot u and e, separately, against xb.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          If you are interested in the validity of the model, I would do neither of these. I would plot observed outcome against xbu (the predicted value for each observation).
          Thank you very much once more! May I ask why plotting observed outcomes against xbu is better than the conventional practice of plotting residual (e.g. ue) against fitted values (e.g. xb or xbu)? I have read lots of materials but they are basically concerned with the cross-sectional case and I did not find much useful information for residual plot under panel regression.

          I have produced the following plots following your comment. I find a strong linear relationship between observed outcome and xbu (graph a), and a similar but much less clear relationship between observed outcomes and fitted values (xb) in graph b. May I ask if this pattern is fine or problematic? This is a simple fixed-effects regression with year dummies.
          Click image for larger version

Name:	Graph 5.png
Views:	1
Size:	98.0 KB
ID:	1459625
          a. Observed Outcomes against Fitted Values+Individual Effects (xbu)

          Click image for larger version

Name:	Graph 6.png
Views:	1
Size:	105.2 KB
ID:	1459626
          b. Observed Outcomes against Fitted Values (xb)


          I have also plotted u against xb, and find a weird graph (graph c) which I have no idea how to interpret. So could you please check if this pattern is problematic?

          Click image for larger version

Name:	Graph 7.png
Views:	1
Size:	104.1 KB
ID:	1459627
          c. Individual Effects (u) against Fitted Values (xb)

          I think this graph of e against xb looks roughly ok, except a potential outlier at the bottom.
          Click image for larger version

Name:	Graph 8.png
Views:	1
Size:	109.9 KB
ID:	1459628
          d. Idiosyncratic Errors (e) against Fitted Values (xb)

          Many thanks again!
          Last edited by Alex Mai; 26 Aug 2018, 05:41.

          Comment


          • #6
            Graphs a and b are what I would expect to see with a good model.

            My preference for not plotting ue against xb is that ue folds together u and e, and if there is a problem, it is more helpful to see whether the u's or the e's (or both) seem out of whack.

            The graph c suggests that the random intercept is highest when the fixed-effects prediction is lowest and vice versa. It's not an overwhelming tendency, but strong enough to be noticeable. This type of situation might be seen when there is an omitted variable, some effect not included in your model, defined at the group level, for which u is serving, in part, as a proxy. You would have to ponder the science you are working in to think what that variable (or those variables) might be and whether or not it is feasible to measure and include them in an expanded model.

            Graph d looks fine to me. I wouldn't worry about one mild outlier.

            Added: The reason you don't see much about graphing residuals in longitudinal analyses is because it isn't much done. I'm not sure why. Perhaps it is because longitudinal data analysis came into its own in an era when robust standard errors were already widely available, so people tend to just use the latter and not care much about the residuals.
            Last edited by Clyde Schechter; 26 Aug 2018, 11:06.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              The graph c suggests that the random intercept is highest when the fixed-effects prediction is lowest and vice versa. It's not an overwhelming tendency, but strong enough to be noticeable. This type of situation might be seen when there is an omitted variable, some effect not included in your model, defined at the group level, for which u is serving, in part, as a proxy.
              Thank you very much! I get your point. So if the plot of u against xb is problematic but the plot of e against xb is ok (just like my case), then it indicates that the potential omitted variable should be a time-invariant variable (for which u is serving). By contrast, if similar problem is observed in the plot of e against xb, rather than in the plot of u against xb, then the potential omitted variable (let alone other problems like heteroskedasticity or outlier) is more likely to be a time-variant variable. Is this interpretation correct?
              Last edited by Alex Mai; 26 Aug 2018, 12:45.

              Comment


              • #8
                Correct.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Correct.
                  Many thanks again! I just now read your previous reply to a post that there is no agreed definition of "standardized residual" after random-effects xtlogit. May I ask how to obtain "standardized residual" after fixed-effects xtreg? By standardized residual, I mean raw residual divided by the estimate of its standard deviation (perhaps this is also called internally studentized residual).

                  For reg, there is an option of rstandard, but it does not work after xtreg.

                  Thank you again!

                  Comment


                  • #10
                    There is also no agreed definition of standardized residual in this context, nor any other model of longitudinal data, nor any other multi-level model. The problem arises because it is not clear whether to standardize within- groups or overall.

                    This question is off the topic of this thread. In the future, when raising a different question, please either start a new topic, or where there is an existing thread on the topic, add it as a new post in that thread. While it is easy to think of these threads as dialogs between a questioner and a responder, in fact there are other people reading along. Some read regularly on topics that interest them. Some come here sporadically and search for threads on a question they have in mind. Either way, it is helpful to them if every thread stays on a single topic so that they can know/find easily what to read. So please be mindful of this going forward.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      There is also no agreed definition of standardized residual in this context, nor any other model of longitudinal data, nor any other multi-level model. The problem arises because it is not clear whether to standardize within- groups or overall.

                      This question is off the topic of this thread. In the future, when raising a different question, please either start a new topic, or where there is an existing thread on the topic, add it as a new post in that thread. While it is easy to think of these threads as dialogs between a questioner and a responder, in fact there are other people reading along. Some read regularly on topics that interest them. Some come here sporadically and search for threads on a question they have in mind. Either way, it is helpful to them if every thread stays on a single topic so that they can know/find easily what to read. So please be mindful of this going forward.
                      Thank you very much! I am sorry for my carelessness. I will be mindful of this for future posting.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        Graphs a and b are what I would expect to see with a good model.

                        The graph c suggests that the random intercept is highest when the fixed-effects prediction is lowest and vice versa. It's not an overwhelming tendency, but strong enough to be noticeable. This type of situation might be seen when there is an omitted variable, some effect not included in your model, defined at the group level, for which u is serving, in part, as a proxy. You would have to ponder the science you are working in to think what that variable (or those variables) might be and whether or not it is feasible to measure and include them in an expanded model.
                        Dear Clyde,

                        Sorry for disturbing you again, but a question relevant to your post just now comes to my mind. Your reply in this post implies that if the scattergram of u (individual effects) against xb (the fitted value) shows a noticeable trend, like the picture below, then there may be omitted time-invariant variables for which is u serving. However, fixed-effects regression does not allow for time-invariant variables as regressors. Also, since fixed-effects model allows for the correlation between individual effects and regressors, I think it is safe to just leave the "omitted time-invariant variable" in u.

                        I am not sure if I am right about this. So may I ask for some more suggestions for this problem of "omitted time-invariant variable"? Many thanks!


                        Click image for larger version

Name:	Graph 3.png
Views:	1
Size:	103.7 KB
ID:	1461506

                        Comment


                        • #13
                          Alex, you are correct. I was thinking about random effects models. Sorry for the confusion.

                          Comment


                          • #14
                            Originally posted by Clyde Schechter View Post
                            Alex, you are correct. I was thinking about random effects models. Sorry for the confusion.
                            Thank you very much! So may I ask how I should deal with the problem displayed in the scattergram? Or can I say that for fixed-effects regression the noticeable pattern displayed in the u-xb scattergram does not matter?

                            If this indeed does not matter, then I think it may imply that for fixed-effects regression, e-xb scattergram is better than u-xb scattergram for residual analysis.
                            Last edited by Alex Mai; 10 Sep 2018, 14:46.

                            Comment


                            • #15
                              Yes, for the fixed effects regression this does not matter. And for fixed-effects regression , the e-xb scattergram is better.

                              I apologize again for the confusion. I really was thinking about a random effects model. I use random effects a lot in my work, and fixed effects only rarely, so my thoughts tend to drift into the -re- mode.

                              Comment

                              Working...
                              X