Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects Poisson regression, partialling out, and ppml.

    Dear Statalist,

    I am working with a zip code level annual crime count data for the period 2000-2008. There was a regulatory change in one state in the middle of the panel data, say 2004, that I suspect has affected crime rates. Thus, it is natural for me to do Diff-in-Diff (DD) or Diff-in-Diff-Diff (DDD). I have three questions to ask and any help will be deeply appreciated.

    Because the zip code level crime data is count data with large amount of zeros, I would like to use fixed effects Poisson model with zip code fixed effects. Since unobserved factors in crime rates might be correlated within counties, I want to cluster at county level, instead of zip code level. However, none of the existing Stata commands, like xtpoisson and xtnbreg, allows clustering at level other than zip code.

    A user written command, ppml, may be a good choice in this case, based on Silva and Tenreyro (2006). The command has the standard weighting options but lacks fixed effects option such as ", fe". Thus, explicitly including more than 3000 zip code dummies in a regression like eqn (1) should be done.

    ppml crime law i.county*time i.year i.zip [aw=pop2000], cluster(county) --- (1)

    , where crime is number of crime occurred, i.zip are zip code dummies, law==1 if after the law change and the zip code is in the area affected by the law, and law==0 if pre-change period or unaffected zip codes. Also time =1 for year=2000 through time=9 for year=2008.

    This regression take too much time. So I need to remove variation in zip code from each variable in eqn (1) before running regressions to cut down the computing burden.

    areg crime, absorb(zip) --- (2)
    predict crimeR, res

    areg law, absorb(zip) --- (3)
    predict lawR, res


    ,and finally run:

    ppml crimeR lawR i.county*time i.year [aw=pop2000], cluster(county) --- (4)


    -----------------------------------------------------------------------------------------------------------------------------------------------------------
    My Question 1 now is: Should I also remove zip code variation from "i.county*time" and "i.year" in eqn (1)? If so, such as:

    xi year --- (5)
    areg _Iyear_2000, absorb(zip) --- (6)


    all the way through

    areg _Iyear_2008, absorb(zip) --- (7)

    and similarly for i.county*time? Then, run a regression of completely partialled-out variables?



    Question 2:

    I have reason to believe that only juveniles are affected by the law change and adults are unaffected. So DDD seems a good strategy. Suppose my crime data now has two observations for each zip: one for juveniles and one for adults. When I run a regression with panel fixed effects like in (8), several variables are automatically omitted because they do not have variation over time.

    gen TxPxJ = (treatedZip*post*juveniles)
    gen TxP = (treatedZip*post)
    gen TxJ = (treatedZip*juveniles)
    gen PxJ = (post*juveniles)


    , where treatedZip = 1 if a zip code is in the affected area by the 2004 law change and zero otherwise, and juveniles=1 for juveniles and zero for adults; so they have no temporal variation.

    xtreg crime TxPxJ TxP TxJ PxJ T P J i.county*time i.year i.zip [aw=pop2000], fe cluster(county) --- (8)

    How do I perform a DDD with zip code fixed effects?? Should I not include the zip code fixed effects for DDD regressions and do a pooled OLS regression as in (9)?

    reg crime TxPxJ TxP TxJ PxJ T P J i.county*time i.year [aw=pop2000], cluster(county) --- (9)


    Question 3:


    Is it also valid to use ppml for DDD estimators?
    -----------------------------------------------------------------------------------------------------------------------------------------------------------

    Thank you!





  • #2
    Dear Paul,

    With respect to 1, I think that what you need is -xtpqml-

    The other questions are not really in my area of expertise but, just in case it helps, I have used DD with PPML. What you need to have in mind is that in this case DD is really a ratio of ratios (the differences are in the log scale).

    All the best,

    Joao

    Comment


    • #3
      Dear Professor Silva,

      Thank you for your reply.

      Regarding your first comment, my worry is that -xtpqml- does not allow users to include [aw=pop] or exposure/offset option. Don't I need to let my regressions reflect population size for crime count in some way because highly populated zip codes will have much more crime counts compared to less populated zip codes?

      May I have a chance to read your paper that uses DD with PPML?? I would like to understand further what it means that DD estimates are a ratio of ratios.

      Appreciate your help professor!

      Regards,

      Paul

      Comment


      • #4
        Dear Paul,

        I would not use population as a weight but I would use (log) population as a regressor. Alternatively, I would consider using population per capita as the dependent variable.

        The paper where we used DD with PPML is this one (most of it is not relevant to you). If you do not have access to it, please send me an email and I'll send you a copy.

        All the best,

        Joao

        Comment


        • #5
          Thank you professor Silva for kindly providing your paper.

          Unfortunately, I only have one-time population data for zip codes (taken from the decennial census), therefore, log(population) as a regressor will be automatically dropped because there is no variation across time.

          More detailed reason I want to adjust for population size, via weighting or exposure/offset option, is that small population zip codes' crime counts are likely to be zeros for some the type of rare crime I am looking into. Thus, if crime rate per capita is used as the dependent variable, there will be a huge jump from zero crime rate per capita whenever a small-population zip code experiences a rare type of crime. So small zip codes' variance of crime rate per capita will be very large. This could seriously bias the estimates if the estimator weighs all zip codes equally. Do you think -xtpqml- corrects for this internally (meaning that I don't need to specify exposure/offset or [aw=pop] in my regression)? Is there anything that I misunderstand here?

          If my understanding above is correct, are you aware of an estimator that can do both 1) clustering at level other than the panel's level and 2) adjusting for population?

          Also, what was your reason you recommended -xtpqml- instead of your -ppml- for my project? I thought -ppml- is a perfect one for my project after partialling out zip code dummies before running main regressions to avoid long computing time.

          Professor Silva, I truly appreciate your help.

          Regards,

          Paul

          Comment


          • #6
            Dear Paul,

            If population does not vary over time and you are including fixed effects, then you can just ignore population because the fixed effects will take care of it (that is why it drops). I do not think exposure will had anything to what you get with fixed effects and I also do not see a particular good reason to use weights. Essentially, the fixed effects will take care of all of these issues you are worried about.

            In view of this I would use -xtpqml- simply because it accounts for the fixed effects. You can also use -ppml- with all the dummies, but that will be much slower. I did not think much about it, but my guess is that the partialling out trick you suggest won't work because we are talking about a non-linear model.

            All the best,

            Joao

            Comment


            • #7
              Dear Professor Silva,

              Thank you for sharing your thoughts and advice!

              Regards,
              Paul

              Comment


              • #8
                Dear Statalist,

                I have a follow-up question on weighting. There is not much information online regarding the necessity of weighting for fixed effects Poisson Quasi-ML. Professor Silva commented that he did not see any reason for weighting for the analysis using crime data, but I have no idea as to the reason.

                In a standard OLS with fixed effects, the majority of researchers uses panel population weights, in the form of [aw=pop], to interpret the estimate of interest as the sample-wide mean effects. One simple example is Wolfers (2006) that investigates the effects of unilateral divorce law on divorce rates in the United States. He used state population weights in his OLS regressions on panel data.

                No weights for a fixed effects Poisson QML specification is not obvious to me. Could anyone help me understand the logic behind it?

                Thank you!

                Comment


                • #9
                  Dear Paul,

                  Please note that I did not say that there is no reason to use weights with crime data or with PPML. I only said that I would not use weights for the purpose you were suggesting, which is a very different thing. That is, I use population as a regressor, not as a weight.

                  More generally, you should use weights if you fear that your model is not correctly specified; otherwise there is little reason to use them. See the chapter on this in Jeff Wooldridge's book.

                  All the best,

                  Joao

                  Comment

                  Working...
                  X