Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clarifying interpretation of marginal effects after reghdfe

    Hello, I am looking for clarification on post-estimation margins command after reghdfe. I am running a pretty standard Diff-in-Diff (effect of a state-level insurance policy change on cancer screening rates in younger vs older individuals) using reghdfe, and want to confirm that I am using/interpreting the margins command correctly.

    Code:
    reghdfe screen i.age##i.post i.RACE i.EDUC , absorb(state_num year month) vce(cluster state_num)
    (MWFE estimator converged in 4 iterations)
    note: 1bn.post is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
    
    HDFE Linear regression                            Number of obs   =  2,056,819
    Absorbing 3 HDFE groups                           F(  10,     14) =     776.20
    Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                      R-squared       =     0.0049
                                                      Adj R-squared   =     0.0049
                                                      Within R-sq.    =     0.0042
    Number of clusters (state_num) =         15       Root MSE        =     0.2780
    
                                         (Std. Err. adjusted for 15 clusters in state_num)
    --------------------------------------------------------------------------------------
                         |               Robust
                  screen |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------------+----------------------------------------------------------------
                     age |
                  older  |  -.0260963   .0022391   -11.65   0.000    -.0308987   -.0212939
                  1.post |          0  (omitted)
                         |
                age#post |
                older#1  |   .0032939     .00066     4.99   0.000     .0018784    .0047094
                         |
                    RACE |
              Black, NH  |   .0078715   .0007183    10.96   0.000     .0063309    .0094121
               Hispanic  |   .0222411    .001537    14.47   0.000     .0189446    .0255376
                  Asian  |    .043212     .00239    18.08   0.000     .0380859    .0483381
                         |
                         |
                    EDUC |
                     HS  |  -.0067054   .0008325    -8.05   0.000    -.0084909     -.00492
           SOME COLLEGE  |  -.0154122   .0009276   -16.61   0.000    -.0174018   -.0134226
      BACHELOR/GRADUATE  |  -.0364558   .0016695   -21.84   0.000    -.0400365   -.0328751
                         |
                   _cons |   .0947413   .0005795   163.48   0.000     .0934983    .0959843
    --------------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
       state_num |        15          15           0    *|
            year |         9           1           8     |
           month |        12           1          11     |
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    When I run the margins command (albeit with the 'noestimcheck' option, but I think is kosher for calculating marginal effects), it omits one category of individual. This because it is collinear with the fixed effects, yes?

    Code:
    . margins age, dydx(post) noestimcheck
    
    Conditional marginal effects                    Number of obs     =  2,056,819
    Model VCE    : Robust
    --------------------------------------------------------------------------------
                   |            Delta-method
                   |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
           1.post  |
               age |
            older  |          0  (omitted)
          younger  |   .0032939     .00066     4.99   0.000     .0020004    .0045875
    --------------------------------------------------------------------------------
    When I do it a different way, it tells me that the predicted change for the older group is 0. This "0" predicted change is garbage, similarly because it is collinear with the fixed effects, yes?

    Code:
    . margins age#post, noestimcheck
    
    Predictive margins                              Number of obs     =  2,056,819
    Model VCE    : Robust
    
    Expression   : Linear prediction, predict()
    --------------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------------+----------------------------------------------------------------
                age#post |
                older#0  |   .0945361    .000978    96.66   0.000     .0926192     .096453
                older#1  |   .0945361    .000978    96.66   0.000     .0926192     .096453
              younger#0  |   .0684398   .0012712    53.84   0.000     .0659483    .0709314
              younger#1  |   .0717337   .0017257    41.57   0.000     .0683514    .0751161
    --------------------------------------------------------------------------------------
    So, do I understand correctly that there is no way to get the marginal effect of pre vs post for the older group? Thanks

  • #2
    Well, you seem to understand the -margins- output correctly. But either your data or your regression model is messed up. There is no valid DID model of valid DID data in which a post variable is used and is also colinear with the fixed effects. In a standard DID model, post is used, but time fixed effects are not (cannot be). In a generalized DID model post is not used at all--only the treatment group # post interaction is used, along with time fixed effects.

    To decide what is wrong, you need to show:
    1. Was the policy implemented at the same time in every state that implemented it (standard DID) or did time of implementation vary (generalized DID)?
    2. What was the code you used to calculate the post variable?
    3. Show example data. Be sure to include data from several state_nums, including some implementers and some non-implementers, and for each of those both pre- and post- implementation observations. Use the -dataex- command to do this. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
    4. Explain your month and year variables. If they are what I think they are, then -absorb(state_num month year)- is incorrect and you need to combine them into a single month-year variable and absorb that, not month and year separately.

    Comment


    • #3
      Hi thank you so much for your response. Yes, I am running a standard DID and I was (foolishly I see now) putting in time FE along with the post variable. I figured post would be dropped as a result, but didn't think through how this would then affect the margins command. That said, I will go through and answer all your questions for any future readers:

      (1) Yes, as just mentioned it's a standard DID, the policy was implemented at same time (Jan, 2012) in every implementing state (fwiw, this particular regression is just looking at implementers)
      (2) Here is the code used to caculate the post var
      Code:
      gen post_2012m1=1 if date_ym>=tm(2012m1)
      replace post_2012m1=0 if date_ym<tm(2012m1)
      (3) I have posted the example data below
      (4) Here, you're saying, *if* I was doing a generalized DID, I should be using the combined month year var (ie, date_ym from the example data) as the absorb variable, yes? However, I still don't understand the full logic for this. Also, what if I wanted to include month FEs to capture seasonility in my outcome? Could I include the combined month/year variable *and* month FEs?

      Thanks again

      ***Example Data***
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double state_num float(post_2012m1 date_ym) double age
       5 0 578 1
       6 0 578 2
      12 0 587 1
      17 0 583 2
      36 0 579 1
      39 0 584 1
      41 0 580 2
      45 0 578 1
      47 0 584 1
       5 0 595 1
       6 0 598 2
      12 0 592 2
      17 0 595 2
      36 0 594 1
      39 0 592 1
      41 0 597 2
      45 0 598 1
      47 0 595 1
       5 0 602 1
       6 0 604 2
      12 0 606 2
      17 0 600 1
      36 0 606 1
      39 0 608 1
      41 0 605 2
      45 0 606 2
      47 0 609 1
       5 0 621 1
       6 0 614 2
      12 0 614 2
      17 0 623 2
      36 0 621 1
      39 0 622 1
      41 0 614 1
      45 0 621 2
      47 0 612 1
       5 1 627 1
       6 1 631 2
      12 1 629 1
      17 1 634 1
      36 1 631 1
      39 1 633 2
      41 1 633 1
      45 1 629 1
      47 1 633 1
       5 1 640 1
       6 1 640 1
      12 1 639 1
      17 1 646 1
      36 1 647 2
      39 1 643 1
      41 1 643 1
      45 1 646 1
      47 1 646 1
       5 1 659 1
       6 1 651 1
      12 1 658 2
      17 1 658 1
      36 1 652 1
      39 1 654 1
      41 1 657 1
      45 1 655 1
      47 1 653 1
       5 1 664 1
       6 1 663 2
      12 1 670 1
      17 1 669 1
      36 1 662 2
      39 1 660 1
      41 1 665 1
      45 1 664 1
      47 1 670 1
       5 1 675 1
       6 1 677 2
      12 1 680 1
      17 1 679 1
      36 1 677 1
      39 1 672 1
      41 1 672 2
      45 1 676 1
      47 1 677 1
       5 0 586 1
       5 0 584 2
       5 0 580 2
       5 0 577 1
       5 0 586 1
       5 0 576 1
       5 0 582 1
       5 0 587 1
       5 0 580 1
       5 0 577 1
       5 0 578 1
       5 0 579 1
       5 0 586 1
       5 0 587 1
       5 0 576 2
       5 0 584 1
       5 0 577 1
       5 0 582 1
       5 0 582 1
      end
      format %tm date_ym
      label values age age
      label def age 1 "older", modify
      label def age 2 "younger", modify
      label var state_num "State of residence - num" 
      label var age "Screening Eligible"

      Comment


      • #4
        (1) Yes, as just mentioned it's a standard DID, the policy was implemented at same time (Jan, 2012) in every implementing state (fwiw, this particular regression is just looking at implementers)
        I'm not sure I understand what you are saying here. A DID analysis must include both implementers and non-implementers. But I think you know that. Are you perhaps just saying that this problem arose in an additional analysis of the same data set and it happened to only involve the implementers?

        Here, you're saying, *if* I was doing a generalized DID, I should be using the combined month year var (ie, date_ym from the example data) as the absorb variable, yes? However, I still don't understand the full logic for this. Also, what if I wanted to include month FEs to capture seasonility in my outcome? Could I include the combined month/year variable *and* month FEs?
        This is an interesting question. If you did include i.date_ym in your model (as would be needed for a generalized DID analysis) then you would be unable to also include month FEs to capture seasonality, because they would be colinear with the i.date_ym variables. If you attempted to include them, either they or some of the i.date_ym indicators would be omitted, and even if the month FEs survived, their coefficients would not be interpretable as seasonal effects. Off hand, I can't think of a way to incorporate seasonal effects directly into a generalized DID analysis. (Adding them to standard DID without time FE would not be a problem assuming that both the pre- and post-intervention periods encompassed all of the seasons.) While I would first try to find examples in the literature (I have never seen any, but have never explicitly looked for them) my instinct would be to first do some kind of seasonal adjustment to the outcome variable and then do the DID analysis of the seasonally-adjusted outcome. If anybody following along this thread knows of a way to incorporate seasonal adjustment into a generalized DID (or, more generally, into a two-way fixed effects analysis) I'd be eager to see it and learn.

        Comment

        Working...
        X