Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about interpretation of didregress for change in prevalence of a binary outcome

    Hello Statlisters,

    *Very* longtime reader (and Stata SE/18.0 user), first-time poster!

    I am a newcomer to DID models. I have been using survey data from two states and three time periods to assess whether the implementation of adult-use cannabis sales (treat) is associated with a change in the prevalence of past 30-day cannabis use (mar30d_new, coded as 0 = no past 30-day cannabis use or 1 = any past 30-day cannabis use), adjusting for some sociodemographic characteristics (sex grade raceeth).

    Here is an example of my code:

    Code:
    didregress (mar30d_new sex grade raceeth) (treat), group(state) time(year)
    And here is my output:

    Code:
    Treatment and time information
    
    Time variable: year
    Control:       treat = 0
    Treatment:     treat = 1
    -----------------------------------
                 |   Control  Treatment
    -------------+---------------------
    Group        |
           state |         1          1
    -------------+---------------------
    Time         |
         Minimum |         1          3
         Maximum |         1          3
    -----------------------------------
    
    Difference-in-differences regression                    Number of obs = 63,033
    Data type: Repeated cross-sectional
    
                                      (Std. err. adjusted for 2 clusters in state)
    ------------------------------------------------------------------------------
                 |               Robust
      mar30d_new | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    ATET         |
           treat |
       (1 vs 0)  |   .0129726   .0010093    12.85   0.049     .0001476    .0257976
    ------------------------------------------------------------------------------
    Note: ATET estimate adjusted for group effects and time effects.
    My question -- is didregress an appropriate command to examine change in state-level prevalence of the outcome, particularly when my outcome data are at the individual level and not aggregated at the state level? Stata documentation refers to didregress for continuous outcomes only, but a dusty corner of my brain from grad school is tempting me to interpret .013 as a 1.3% increased prevalence (or perhaps 1.3% increased risk) of past 30-day cannabis in the treated vs. the control state.

    If that dusty corner of my brain is incorrect, is there a binary outcome equivalent for the didregress Stata command that I have missed somehow? I would like to use the wildbootstrap option to compute 95% CIs given my small number of groups.

    Thanks everyone for reading my post - looking forward to hearing from you!
    Last edited by Jennifer Pearson; 12 Jul 2023, 00:44.

  • #2
    It's not a % change, but a percentage point change. Divide it by the mean DV for a percent change.

    LS is fine as long as prevalence not too small. DID and non-linear models is iffy (see Lechner).

    Comment


    • #3
      Thank you, George - that is very helpful. The Lechner article is a nice overview and I will check it out. One point of clarification -- what do you mean by "LS"?

      Comment


      • #4
        least squares, or linear probability model in your case.

        Comment


        • #5
          Thank you very much for your help, George!

          Comment


          • #6
            I disagree with Lechner on this point. When the parallel trends assumption is stated based on the linear index -- as seems natural in many cases with discrete outcomes -- nonlinear DiD works just fine. I work this all out in my forthcoming paper in the Econometrics Journal -- for now, available here:

            Nonlinear Did

            A logit uses a different PT assumption than the linear model, but that seems natural. One can compute the average treatment effect by careful application of the margins command. Having said that, the linear analysis is probably fine in most cases. But if you want to check the robustness, you can try, say, logit to obtain the average treatment effect on the treated:

            Code:
            logit mar30d_new i.treat i.state i.year sex grade raceeth
            margins, dydx(treat) subpop(if treat == 1)
            The details change a bit depending on whether you have only one or two treated periods, but that wasn't clear to me.

            Comment


            • #7
              Hi Jeff - Thank you for the dropbox link -- I just started looking at your files there and I can tell they are going to be incredibly helpful. What's the publication timeline on your forthcoming paper?

              re: number of treated periods -- I only have 1 treated period (2019). We do have 2021 data, but I doubt the utility of including it given COVID.

              For comparison to the didregress output and the benefit of future Statalisters, were is the output associated with Jeff's code:

              Code:
              logit mar30d_new i.treat i.state i.year sex grade raceeth
              
              Iteration 0:  Log likelihood = -19825.303  
              Iteration 1:  Log likelihood = -19009.466  
              Iteration 2:  Log likelihood = -18965.362  
              Iteration 3:  Log likelihood =   -18965.2  
              Iteration 4:  Log likelihood =   -18965.2  
              
              Logistic regression                                    Number of obs =  60,054
                                                                     LR chi2(7)    = 1720.21
                                                                     Prob > chi2   =  0.0000
              Log likelihood = -18965.2                              Pseudo R2     =  0.0434
              
              ------------------------------------------------------------------------------
                mar30d_new | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                   1.treat |    .256056   .0726985     3.52   0.000     .1135695    .3985424
                           |
                     state |
                    1. NV  |   -.603645   .0472461   -12.78   0.000    -.6962456   -.5110445
                           |
                      year |
                  2. 2017  |   .0053034   .0396036     0.13   0.893    -.0723183     .082925
                  3. 2019  |   .1507857   .0422767     3.57   0.000     .0679248    .2336465
                           |
                       sex |  -.0535572    .027379    -1.96   0.050    -.1072191    .0001048
                     grade |   .6594678   .0190432    34.63   0.000     .6221438    .6967919
                   raceeth |  -.1043671   .0068711   -15.19   0.000    -.1178342      -.0909
                     _cons |  -2.968021   .0777298   -38.18   0.000    -3.120369   -2.815673
              ------------------------------------------------------------------------------
              
              . margins, dydx(treat) subpop(if treat == 1)
              
              Average marginal effects                              Number of obs   = 60,054
              Model VCE: OIM                                        Subpop. no. obs =  5,008
              
              Expression: Pr(mar30d_new), predict()
              dy/dx wrt:  1.treat
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                     treat |
                        0  |          0  (empty)
                        1  |   .0184698    .005257     3.51   0.000     .0081663    .0287734
              ------------------------------------------------------------------------------
              Note: dy/dx for factor levels is the discrete change from the base level.

              Comment


              • #8
                Hi Jennifer. I'm glad it's helpful. I just finished the proofs for the paper. Hopefully in the September 2023 issue of The Econometrics Journal.

                You probably should at least try clustering by state. Even though you have a lot of observations per state it should probably work well.

                BTW, I view it as a good thing that the linear and logit estimates tell a similar story. The logit ATT is about 40% larger, and that might matter. You can try the model with lots of interactions, too. You don't even need didregress for the linear case (which does not estimate a flexible model): just replace logit with reg.

                Comment


                • #9
                  Hi Jeff - Thank you again - you've been a huge help and I've learned a lot. Looking forward to the publication.

                  Comment

                  Working...
                  X