Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-effects model and Fixed-effects Poisson model: should/ can I take the log of the outcome variable in a FE Poisson model?

    Hi all,

    I am facing a model specification problem related to the Fixed-effects model and the Fixed-effects Poisson model. I am working with an unbalanced panel data set that contains 3,588 observations and 110 groups, spanning 45 years. The continuous outcome variable, which represents military aid from the U.S., contains 1,519 zeros. Upon examining both the outcome variable and the independent variables, I found that all of them are right-skewed. Therefore, as my first step, I took the natural logarithm of the dependent variable (DV) and all independent variables (IVs), except for the interaction term. I then tested the hypothesis using the Fixed-effects model. The relevant code is provided below:

    Code:
    xtreg log_US_militiary_aidit i.Dummy1##i.Dummy2 lag_log_X1t lag_log_X2t lag_log_X3t lag_log_X4t  lag_log_X5t, fe vce(cluster cow_code)
    I used predict and kdensity functions to plot the residuals. As below:
    Click image for larger version

Name:	Density of residuals of logged model.jpg
Views:	1
Size:	27.8 KB
ID:	1727276




    However, even after taking the natural logarithm of the outcome variable, it still exhibits excess zeros and remains right-skewed. See below:
    Click image for larger version

Name:	histogram_log_milaid_21.png
Views:	1
Size:	21.5 KB
ID:	1727277




    This leads me to question whether the Fixed-effects model is a good fit for the data, even though the distribution of residuals appears to be acceptable. This is my first question.

    I began searching for solutions to address the excess zeros in the outcome variable. Dr. Wooldridge (1999) suggests that continuous variables can also be used in Fixed-effects Poisson models in "Distribution-free estimation of some nonlinear panel data models." Therefore, I tested my data using 'xtpoisson'. Below are two different models, the first one is logged DV and the second one is unlogged DV.

    Code:
    xtpoisson log_US_militiary_aidit i.Dummy1##i.Dummy2 lag_log_X1t lag_log_X2t lag_log_X3t lag_log_X4t  lag_log_X5t, fe vce(robust)
    Code:
    xtpoisson US_militiary_aidit i.Dummy1##i.Dummy2 lag_log_X1t lag_log_X2t lag_log_X3t lag_log_X4t  lag_log_X5t, fe vce(robust)
    Things are becoming tricky for me. First, should/ can I take the log of the outcome variable in a FE Poisson model? Second, can I even take the log of any variable in a FE Poisson model? Third, can I even use FE Poisson to estimate my question since the outcome variable is continuous?

    I am grateful for your time and consideration in reading this post. Any help you can provide is appreciated.
    Last edited by Poyung Lin; 15 Sep 2023, 15:04.

  • #2
    Hi,
    Maybe I did not fully get your questions but I do have some general advice.
    In your position, I would take a step back: you need to define what is the coefficient you are looking for. Of course you can take the log(outcome) is a poisson regression, but the interpretation of the coefficient is different! (As in linear regression, regressing y on x has an interpretation while regressing log(y) on x has another. The same applies to Poisson regression). I would suggest Cameron and Trivedi 2005 Microeconometrics book for a clearer explanation on how to interpret poisson regression estimates.
    Second, you say that your dependent variable has a mass at zero. I think you know it, but you need to be careful with using logs because those observations with value 0 are removed from the sample.
    In the end, if you have clear what is the coefficient you are looking for and the problem is with the zeros, I suggest you read Chen, J., & Roth, J. (2022). Log-like? ATEs defined with zero outcomes are (arbitrarily) scale-dependent.

    Comment


    • #3
      I wouldn't take the log before using Poisson. And, as William pointed out, it's a particularly bad idea to add one and then take the log when you have zeros. In addition to the problems of lack of scale invariance of the estimates, and the problems pointed out by Chen and Ross, you will have just as many zeros as you had before. An exponential mean function makes sense whether or not you have "excess zeros." If it's "excess," what is the benchmark you're using to compare? Use xtpoisson US_militiary_aidit and it gives estimates that have percentage change interpretations; in particular, they are free of units of measurement of the military aid variable.

      Notice that your outcome variable is neither discrete nor continuous. It's mixed. I call these "corner solutions" because zero is a legitimate outcome, and in some cases the response is a zero.

      A final comment: You will get better answers if you show your output -- as requested in the FAQ.

      Comment


      • #4
        Thank you so much for your feedback, Jeff Wooldridge and William Rossi.

        The reasons I take the logarithm of the variables are twofold. First, the outcome variable and all independent variables are right-skewed; for example, consider the independent variable 'population'. Second, using the logarithm makes it easier for me to interpret the effects. If a variable contains a mass of zero values, I will use log(x+1) to retain those observations. In my research, the outcome variable—U.S. military aid—contains such a mass of zero values.

        My original question is focused on: how would the ally identity affect US military aid to every country in specific years? Below is my original Fixed-effects model to estimate the effect of Dummy1(ally identity) and Dummy2(specific years):

        Code:
        xtreg log_milaid_21 i.ab_signal_3##i.ally import export pop gdp military_expenditure, fe vce(cluster cow_code)
        Below is the result. You can find the distribution of residuals of this model in the last post.

        Click image for larger version

Name:	fixed effects model result.png
Views:	1
Size:	24.4 KB
ID:	1727376

        As I mentioned earlier, I discovered a substantial number of zeros in the outcome variable. Consequently, I searched for a model to better accommodate my outcome variable. I assumed that xtpoisson might be a more suitable fit. However, when I use both logged and unlogged variables in xtpoisson, I encounter tricky results. Below are the results.

        1. xtpoisson with the logged outcome variable and logged IVs
        Code:
        xtpoisson log_milaid_21 i.ab_signal_3##i.ally lag_log_import_21 lag_log_export_21 lag_log_vdempop lag_log_vdemgdp lag_log_milex, fe vce(robust)
        Click image for larger version

Name:	xtpoisson with logged variables.png
Views:	1
Size:	24.6 KB
ID:	1727377


        2. xtpoisson with unlogged outcome and unlogged IVs
        Code:
        xtpoisson milaid_21 i.ab_signal_3##i.ally lag_import_21 lag_export_21 lag_vdem_pop lag_vdem_gdp lag_milex, fe vce(robust)
        In this model, I get the below result.

        Iteration 295: log pseudolikelihood = -2.755e+11 (backed up)
        Iteration 296: log pseudolikelihood = -2.755e+11 (backed up)
        Iteration 297: log pseudolikelihood = -2.755e+11 (backed up)
        Iteration 298: log pseudolikelihood = -2.755e+11 (backed up)
        Iteration 299: log pseudolikelihood = -2.755e+11 (backed up)
        Iteration 300: log pseudolikelihood = -2.755e+11 (backed up)
        convergence not achieved

        Click image for larger version

Name:	xtpoisson with unlogged variables.png
Views:	1
Size:	20.0 KB
ID:	1727378


        3. xtpoisson with logged outcome variable and unlogged IVs
        Code:
        xtpoisson log_milaid_21 i.ab_signal_3##i.ally lag_import_21 lag_export_21 lag_vdem_pop lag_vdem_gdp lag_milex, fe vce(robust)
        Click image for larger version

Name:	xtpoisson with logged dv and unlogged ivs.png
Views:	1
Size:	24.9 KB
ID:	1727379

        4. xtpoisson with unlogged dv and logged IVs
        Code:
        xtpoisson milaid_21 i.ab_signal_3##i.ally lag_log_import_21 lag_log_export_21 lag_log_vdempop lag_log_vdemgdp lag_log_milex, fe vce(robust)
        Click image for larger version

Name:	xtpoisson with unlogged dv and logged variables.png
Views:	1
Size:	25.5 KB
ID:	1727380


        In conclusion, I am attempting to find a model that can accommodate a continuous dependent variable with an excess of zeros, specifically U.S. military aid. I have tried using a fixed-effects model with logged dependent and logged independent variables and have plotted the residuals, which seem acceptable to me. However, when I employ xtpoisson, the model fails to converge if the dependent variable and independent variables are not logged. Furthermore, the results differ from the fixed-effects model when the dependent variable is not logged but the independent variables are. I am uncertain about what is causing these discrepancies. I would greatly appreciate any advice on model selection, as well as any insights into the results I've obtained so far. Thank you very much for your time and guidance.

        Comment


        • #5
          Dear Poyung Lin ,
          as Jeff said the coefficients of a Poisson regression have a percentage change interpretation so regressions 2 and 4 of your output would be what you are looking for.

          Now, reading some threads, I believe that the reason why xtpoisson does not achieve convergence is related to your independent variables + mass of zeros. (Read this thread for more on this: https://www.statalist.org/forums/for...e-not-achieved). Try looking at how many ones and zeros are there in your dummies.

          Reagarding which model is better suited for your analysis, I would suggest to stick with the poisson regression with the outcome not logged (to have the interpretation you want).
          Moreover, it may be of interest to look at the extensive margin: create an indicator = 1 if military aid is bigger than 0 and 0 if military aid is 0 and use the indicator as dependent variable.

          Comment


          • #6
            Dear William Rossi and Jeff Wooldridge

            Thank you so much for your valuable advice! The ppmlhdfe model solves the convergence problem, and I will stick to the poison regression(ppmlhdfe) with the outcome not logged! Below is my current code and results.

            Code:
            ppmlhdfe milaid_21 i.ab_signal_3##i.ally lag_log_import_21 lag_log_export_21 lag_log_vdempop lag_log_vdemgdp lag_log_milex, absorb(cow_code) vce(robust)
            Click image for larger version

Name:	ppmlhdfe-logged.png
Views:	1
Size:	49.7 KB
ID:	1727511

            Code:
            ppmlhdfe milaid_21 i.ab_signal_3##i.ally lag_import_21 lag_export_21 lag_vdem_pop lag_vdem_gdp lag_milex, absorb(cow_code) vce(robust)
            Click image for larger version

Name:	ppmlhdfe-unlogged.png
Views:	1
Size:	49.3 KB
ID:	1727512

            One thing I am very curious about is why the font color of Interactions 3 to 14 is red. Could you kindly let me know what could be the cause of that? Thank you so much for your time and suggestions again!

            Best,
            Po

            Comment


            • #7
              I do not see anything in red from your output and I am not too familiar with the ppmlhdfe package, so I do not have any idea why this would happen. Try to look at the help of the function and resources online (https://github.com/sergiocorreia/ppmlhdfe).

              Comment


              • #8
                My guess is that it has to do with checking for seprated observations at each fixed effect. Try
                Code:
                sep(fe ir)

                Comment


                • #9
                  Dear Maxence Morlet and William Rossi,
                  Thank you so much for your reply. It's very helpful!!

                  Comment

                  Working...
                  X