Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi, Joao. I just found this question here and your answer, and I didn't understand the last part of your code:
    replace yhat=yhat*exp_alpha if meanyhat>0 Why do we have to multiply yhat and exp_alpha? I am sorry if it is a silly question, I am new in this kind of analysis. If I can find an answer to this question in a book or article, I will also appreciate the indication. Thank you very much, César Soares

    Comment


    • #17
      Cesar,

      The yhat on the RHS is the fitted value without the fixed effect and exp_alpha is the exponential of the fixed effect (or the multiplicative fixed effect). So, you need to multiply the two to get the fitted value including the fixed effect.

      Joao

      Comment


      • #18
        Thank you very much, Joao!

        Comment


        • #19
          Originally posted by Joao Santos Silva View Post
          Dear Bob,

          So many people ask that question that I prepared an example illustrating how to do it. The example uses the auto dataset from Stata, but you can easily modify it to fit your case:

          Code:
          sysuse auto, clear
          g id=rep78
          drop if id==.
          xtset id
          xtpoisson price mpg, fe
          predict double fitted, xb
          gen double yhat=exp(fitted)
          egen meany=mean(price) , by(id)
          egen meanyhat=mean(yhat), by(id)
          gen double exp_alpha=meany/meanyhat if meanyhat>0
          replace yhat=yhat*exp_alpha if meanyhat>0
          su price yhat
          All the best,

          Joao

          Dear Joao,

          Could you explain why multiplicative individual effects can be calculated like this?
          Is there any theoretical proof?
          I feel a bit confused by the fact that predicted y is calculated with the help of the observed y...
          Last edited by Victoria Moskvina; 07 Mar 2019, 01:30.

          Comment


          • #20
            Dear Victoria Moskvina,

            We are estimating the fixed effects using the fact that the residuals sum to zero within groups; that is why we need the dependent variable. The idea is similar to what we do in OLS with fixed effects.

            Best wishes,

            Joao

            Comment


            • #21
              @Joao Santos Silva, thank you very much!

              Comment


              • #22
                Dear Joao Santos Silva,

                Thank you very much for providing that example and code - that was extremely helpful. From my understanding, the 'yhat' variable in your code gives the final fitted values from an FE Poisson regression, and we can calculate the residuals by subtracting that from the dependent variable. I have two questions about how to use this information:

                1) I would love to do either the Pearson deviance test or chi-squared deviance test. The Pearson residuals seem easier to calculate. My understanding of how to get Pearson residuals is simply to divide the residuals by the SD of the dependent variable. Is that correct? Moreover, I'm not sure what I'd do with this information to actually implemented the test.

                2) I see you recommended the reset test and have code on your website that indicates that we're testing the significance of the square of the fitted values as an additional regressor. From the code you give here for getting fitted values, I get the sense that we simply need to square the final 'yhat' and include that as a regressor and do a t-test on that if we are using an FE Poisson model. I'm also a little unsure about how to interpret my results which come in the form of a chi2 value and 'Prob > chi2'. Is seeing a value of 0 as 'Prob > chi2' adequate indication that the model fits well?

                Thank you very much! I really appreciate your help.

                Regards,
                Mansi Jain

                Comment


                • #23
                  Dear Mansi Jain,

                  1) Why would you want to do this?
                  2) I think that what you need are the squares of the log of yhat, right?

                  Best wishes,

                  Joao

                  Comment


                  • #24
                    Dear Joao Santos Silva,

                    Thank you for your response!

                    Re 1, apologies --- I misstated my intent. I was hoping to plot Pearson residuals vs fitted values to look for weird pattern/ outliers. Your code gave me the normal residuals (dependent variable - yhat) easily; I was just trying to transform those into Pearson residuals so I could plot that against the fitted values. I'm confused about two aspects of that. Firstly, my understanding from Cameron and Trivedi's book is that I need to divide these residuals by the SD of y_i. I'm not sure what that means --- how can each observation y_i have an SD? I'm also seeing that for a Poisson model, people just divide by the square root of the fitted value (yhat) since mean is supposed to equal variance, but I don't see why one would do this considering that data is rarely equi-dispersed even if it is in the framework of a Poisson model.

                    Re 2 -- Aah ok! Just so I make sure I understand, I can either directly include the square of 'fitted' from the code above in the regression or the square of the log of the final yhat, correct? I'm slightly confused because the value for Prob > chi2 I get from each of those is slightly different (0.2 in the first case and 0.1456 in the second). Also, neither of these values are particularly high - should I be concerned?

                    Thank you so much for your help!

                    Comment


                    • #25
                      Dear Mansi Jain,

                      1 - You are correct in saying that the Pearson residuals are interesting only under the assumption that the data are truly Poisson. That is why I asked why you would want to compute them because they are generally not useful. Indeed, the robustness of the Poisson regression means that it can be useful even when the data are not Poisson.

                      2 - They are different because one includes the fixed effects and the other doesn't. The reported p-values are quite comfortable.

                      Best wishes,

                      Joao

                      Comment


                      • #26
                        Dear Joao Santos Silva,

                        Thank you very much! That was very helpful. Would you primarily just recommend looking at output from the RESET test then to assess goodness of fit? For context, my dependent variable is a count variable in the form of number of trips to recreation sites, and is probably overdispersed. I have panel data (~3000 IDs x 10 years) which I'm analyzing using an FE Poisson model. The only other options I've read about are residual analysis, chi square test and goodness of link test, but it sounds from your comment that perhaps residual analysis isn't very helpful (unless the comment you made above isn't true for other residuals like deviance or Anscombe?).

                        Thank you very much!

                        Comment


                        • #27
                          Dear Mansi Jain,

                          Unless you want to use the to compute probabilities of events, you do not need the data to have a Poisson distribution and therefore the standard "goodness-of-fit" measures are not helpful.

                          The RESET is a functional form test, not a "goodness-of-fit" test, and it is interesting because the validity of the Poisson regression depends on the correct specification of the functional form on the model (just like in the case of linear regression).

                          Best wishes,

                          Joao

                          Comment


                          • #28
                            Dear Joao Santos Silva,

                            Thank you so much for all your help. It seems then that a goodness of link test wouldn't be helpful, and no particular residual seems to be all that helpful. I spoke to my PI saying this, but they're still keen on having some sort of pseudo R^2 or something report with every model. A lot of the R^2 measures I'm seeing like the deviance based R^2 seems to rely on the data actually being Poisson as we discussed above, but I'm wondering why the square of the correlation between actual and predicted values isn't more commonly used. It seems simple and understandable, and it has disadvantage of this approach is that it can decrease as regressors are added. Apart from that, are there other disadvantages? I'm wondering why people don't use this very simple measure more often. There is also a paper (https://www.sciencedirect.com/scienc...67947303000628) that suggests adjusting deviance based R^2 for under/over dispersion by calculating the dispersion parameter. Would you say doing this is common practice?

                            Thank you so much!
                            Mansi

                            Comment


                            • #29
                              Dear Mansi Jain,

                              I would stick to the square of the correlation coefficient, after all, that is literally R2. Also, it is not based on distributional assumptions and applicable in may situations.

                              Best wishes,

                              Joao

                              Comment


                              • #30
                                Actually, the Pearson residuals are useful more generally: any time the variance is proportional to the mean, whether the constant of proportionality can be less than one (underdispersion) or greater than one (overdispersion). So it's natural to use them in place of the usual residuals.

                                But one thing is problematical in this discussion of RESET, R-squared, and so on: how is one dealing with the estimated fixed effects? These are poor estimates for small T. The usual RESET cannot simply be applied ignoring the fact that those depend on essentially T observations. Using them in a RESET almost certainly creates an incidental parameters problem. Now, if the RESET is just applied to the x(i,t)*betahat terms, and then xtpoisson applied, that is fine. Maybe I missed that in the discussion.

                                JW

                                Comment

                                Working...
                                X