Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I'm sorry if my wording confused you. Let's go back a step.

    In the classical DID analysis, there is a variable that encodes (0/1) intervention vs control group, another variable that encode (0/1) pre- vs post intervention time period. Both of these variables and their interaction term occur in the regression for a classical DID. In the generalized DID, these variables are not in the regression. Instead there is only a single variable that is 1 for observations that are both in the intervention group and occur in the post-intervention (for that unit) period, and 0 everywhere else. This variable is, if you like, analogous to the interaction term in the classical DID--but there are no corresponding "main" effects to include.

    Another key difference between generalized and classical DID is that in the latter, it may not be necessary to include fixed effects for unit and time (although one might do so anyway for other reasons), but in the generalized DID they are absolutely required.

    The generalized DID does capture the effect of intervention on the treatment group provided the same assumptions required for classical DID to identify the treatment effect are met.

    Comment


    • #17
      Clyde Schechter Thank you for the clarification.

      Comment


      • #18
        Dear Clyde,
        I hope you are doing great. I run the regression by creating an Interaction variable like you suggested me above. Considering the condition (single variable that is 1 for observations that are both in the intervention group and occur in the post-intervention (for that unit) period, and 0 everywhere else). I reduced the treatment group to 30 countries and 110 countries for control in order to have a balanced panel data set. Would this effect the results?

        I am attaching the results from the DID regression but i am not very much satisfied with the R Sq value, which is too low.

        . xtreg gdpgrowthannual did, fe

        Fixed-effects (within) regression Number of obs = 4464
        Group variable: id Number of groups = 144

        R-sq: within = 0.0014 Obs per group: min = 31
        between = 0.0031 avg = 31.0
        overall = 0.0013 max = 31

        F(1,4319) = 6.02
        corr(u_i, Xb) = -0.0470 Prob > F = 0.0142

        ------------------------------------------------------------------------------
        gdpgrowtha~l | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        did | 1.041065 .4242073 2.45 0.014 .2094009 1.872729
        _cons | 3.127226 .0899819 34.75 0.000 2.950815 3.303637
        -------------+----------------------------------------------------------------
        sigma_u | 1.6540493
        sigma_e | 5.5393767
        rho | .08186213 (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(143, 4319) = 2.76 Prob > F = 0.0000


        My second question about this, how can i meet the assumption to run generalized DID, which is the trends of both the groups are same before the treatment?

        Comment


        • #19
          I reduced the treatment group to 30 countries and 110 countries for control in order to have a balanced panel data set. Would this effect the results?
          Yes, it can affect the results. There is no reason to do this--there are no advantages to having a balanced data set in this kind of analysis. And by removing some of the data, you are now studying a sample that may well be biased.

          I am attaching the results from the DID regression but i am not very much satisfied with the R Sq value, which is too low.
          Low R2 is not necessarily a problem. If the data are very noisy, then obtaining a high R2 is simply not possible. But that doesn't alter the validity of estimating the treatment effect. The two issues are unrelated.

          My second question about this, how can i meet the assumption to run generalized DID, which is the trends of both the groups are same before the treatment?
          My preferred way to do this is to calculate the mean value of the outcome group in both the treatment group (i.e. those that eventually get the intervention) and the control group in every year before the intervention, and then graph them.
          So something like

          Code:
          collapse (mean) outcome_variable if did == 0, by(year treatment_vs_control)
          reshape wide outcome_variable, i(year) j(treatment_vs_control)
          graph twoway connect outcome_variable* year
          Then you can literally see to what extent the trends look parallel.

          Comment


          • #20
            Dear all
            I plan to check how a certain law(SARFAESI) has affected the debt position of firms in a country. I haven't used DID specification so far and my little understanding is based on this forum, especially https://www.statalist.org/forums/for...in-differences.

            Law is coded as 1(the law is in place) when years are 2002 2003 2004 & 0 if years are 1997 1998 1999 2000 2001(before passing the law). Hence time part is defined.
            Since the passage of law affected all firms, I don't have a natural treatment and control groups, hence I followed the literature and classified firms into treated and controlled based on the firm's tangible assets(tangibility). Hence all the firms that fall in the bottom tercile form my treatment group and the top decile is the control group. For such classification, I used pre-law enforcement years;-1998,1999,2000


            My panel ID is firm(denoted by ccode) & I set my panel as

            Code:
            xtset ccode year
            For the time part, I coded

            Code:
            gen sarfaesi=.
            replace sarfaesi=0 if year>1996 & year<2002 //denoting before law period
            replace sarfaesi=1 if year>2001 & year<2005 //denoting after/during law period
            For the group part, first I created a period from 1998-2000 as this denotes pre-law enforcement years

            Code:
            gen treat_years=.
            replace treat_years=1 if year>1997 & year<2001  // treatment period
            replace treat_years=0 if year<1998 | year> 2000 // we will NOT consider this period but simply coded
            Then, I classified firms into treated(high tangible) & control(low tangible) groups for the treatment period from 1998-2000

            Code:
            egen tertiles=xtile(tang1_w), n(3) by(treat_years)
            
            gen tang_group=.
            replace tang_group=1 if treat_years==1 & tertiles==3 // high tangible groups or treatment group
            replace tang_group=0 if treat_years==1 & tertiles==1 // low tangible groups or control group
            My outcome variables is secured borrowings(secborr_ta_w) & I ran the following regression


            Code:
            . xtreg secborr_ta_w i.sarfaesi##i.tang_group i.year ,fe vce(robust)
            note: 0.sarfaesi omitted because of collinearity
            note: 1.tang_group omitted because of collinearity
            note: 0.sarfaesi#1.tang_group omitted because of collinearity
            note: 1997.year omitted because of collinearity
            
            Fixed-effects (within) regression               Number of obs     =      2,587
            Group variable: ccode                           Number of groups  =      2,587
            
            R-sq:                                           Obs per group:
                 within  =      .                                         min =          1
                 between =      .                                         avg =        1.0
                 overall =      .                                         max =          1
            
                                                            F(0,2586)         =          .
            corr(u_i, Xb)  =      .                         Prob > F          =          .
            
                                                 (Std. Err. adjusted for 2,587 clusters in ccode)
            -------------------------------------------------------------------------------------
                                |               Robust
                   secborr_ta_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            --------------------+----------------------------------------------------------------
                     0.sarfaesi |          0  (omitted)
                   1.tang_group |          0  (omitted)
                                |
            sarfaesi#tang_group |
                           0 1  |          0  (omitted)
                                |
                           year |
                          1997  |          0  (omitted)
                                |
                          _cons |   .3474875          .        .       .            .           .
            --------------------+----------------------------------------------------------------
                        sigma_u |  .29553325
                        sigma_e |          .
                            rho |          .   (fraction of variance due to u_i)
            -------------------------------------------------------------------------------------
            .
            If I ran the random effects specifications,

            .
            Code:
             xtreg secborr_ta_w i.sarfaesi##i.tang_group i.year ,re vce(robust)
            note: 0.sarfaesi omitted because of collinearity
            note: 0.sarfaesi#1.tang_group omitted because of collinearity
            note: 1997.year omitted because of collinearity
            insufficient observations
            r(2001);
            In thIS same post Carlo Lazzaro #4 has demonstrated a similar one. But my question is
            1) Given the above setting, how to estimate the above regression without dropping time##group(i.sarfaesi##i.tang_group), since I have seen many paper using similar specification with firms fixed effects.
            2) Is there anything fundamentally wrong in my codes, logic, treatment period that resulted in dropping observations. Sorry for flagging Clyde Schechter, as I have relied on some of your writing on this topic and if I misunderstood them, I would like to correct it

            Comment


            • #21
              Ial:
              focusing on your last post, if you have 2587 groups and 2587 observations, you do not have a panel but a cross-sectional dataset.
              Thats why -xtreg- results are letting you down.
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #22
                Thanks Carlo Lazzaro. Can I ask some doubts over which I am brooding over
                1)In my case, before the event years are say 1999,2000,& 2001 and after the event years are 2002,2003 & 2004(assume we have 3 years in both). The classification of groups into treatment and control is based on some criteria(assets) during the pre-event period. In my case, it was 1998,1999,2000. Hence there are overlapping issues with respect to time. Is this what leads to multicollinearity. In an ideal case how should be such group classification?
                2) Also should I go for reg or xtreg as in papers I have seen using firm fixed effects and time dummies with DID specification

                Comment


                • #23
                  Ial:
                  take a look at https://www.princeton.edu/~otorres/DID101R.pdf
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #24
                    Since the passage of law affected all firms, I don't have a natural treatment and control groups, hence I followed the literature and classified firms into treated and controlled based on the firm's tangible assets(tangibility). Hence all the firms that fall in the bottom tercile form my treatment group and the top decile is the control group.
                    Design first, analysis second. You are not going to be able to do a DID with this data. Unless the nature of the law whose effect you are trying to estimate is that it has no practical effect in firms in the top decile of assets, this designation of treatment and control groups is arbitrary and meaningless. Your "DID analysis" will instead simply be a pre-post comparison of outcomes, with the effects estimated separately in these two asset-based groups. Except, it won't even be that, because they way you have (mis)coded this the asset-based group variable isn't even defined in the pre-law period. So all you have is a division of the firms into asset groups during the treatment period. Consequently, even the weak pre-post estimate of effect has been subverted. You have nothing in this model that tells you anything whatsoever about the effect of the law. Nothing. You are fortunate that Stata's regression output made it obvious that something is wrong.

                    Comment


                    • #25
                      Clyde Schechter thanks for the help. I decided to learn a little bit about Diff and Diff before posting further questions. In my proposed model, there is a time dummy in which years 2014,2015, and 2016 denote before regulation and 2017,2018, and 2019 denote the period of regulation. Now in my set-up, there are no natural treated and control groups and hence based on literature as well as anecdotes, I took ownership as the basis for classification. Thus my design is to divide the sample into two groups (above the median and below median), based on firms’ average pretreatment measure (2014 to 2016) of ownership where the highest group (above median ) is my treated group and the lowest block (less than the median) is my control group. Now let show what I have done
                      Code:
                      . xtset id year
                             panel variable:  id (unbalanced)
                              time variable:  year, 2014 to 2019, but with gaps
                                      delta:  1 unit
                      
                      . distinct id year
                      
                             |        Observations
                             |      total   distinct
                      -------+----------------------
                          id |      11788       2619
                        year |      11788          6
                      
                      ****Creating Pre-reg and Post-reg period*****************************
                      . gen post=.                // for creating time dummy
                      (11,788 missing values generated)
                      
                      . replace post=0 if year&gt;2013 &amp; year &lt;2017 &amp; year!=. //pre-regulation period
                      (5,432 real changes made)
                      
                      . replace post=1 if year&gt;2016 &amp; year &lt;2020 &amp; year!=. // post-regulation period
                      (6,356 real changes made)
                      
                      
                      
                      *****Creating Treatment and control group based on ownership structure*********************
                      *giving summary of the variable ownership (owner) for the period 2014-2019. The variable is in %
                       univar owner if year &gt;2013 &amp; year &lt;2020
                                                              -------------- Quantiles --------------
                      Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
                      -------------------------------------------------------------------------------
                         owner   11562    52.86    18.72     0.00    42.03    56.28    68.49   100.00
                      
                      
                       egen owner_year=xtile(owner ) if year&gt;2013 &amp; year&lt;2017, n(2) by(year) // to classify owner into 2 categories
                      &gt;  based on pre-reg period
                      (6,458 missing values generated)
                      
                      
                      . egen mean_owner=mean(owner_year),by(id) // to get mean by id
                      (1,153 missing values generated)
                      
                      .
                      
                      . univar mean_owner
                                                              -------------- Quantiles --------------
                      Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
                      -------------------------------------------------------------------------------
                      mean_owner   10635     1.50     0.48     1.00     1.00     1.67     2.00     2.00
                      -------------------------------------------------------------------------------
                      
                      *since the median is 1.67, we classify treatment group  as those firms which has owner&gt;1.67
                       &amp; control group as firms with owner&lt;1.67
                      
                       gen owner_group=.
                      (11,788 missing values generated)
                      
                      .
                      . replace owner_group=1 if mean_owner&gt;1.67 &amp;  mean_owner!= .              //treated group
                      (5,002 real changes made)
                      
                      . replace owner_group=0 if mean_owner&lt;1.67 &amp; mean_owner!= .        // control group
                      (5,633 real changes made)
                      
                      *Cross checking whether treated group has higher owner or not
                      
                       univar owner,by(owner_group)
                      
                      -&gt; owner_group=0
                                                              -------------- Quantiles --------------
                      Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
                      -------------------------------------------------------------------------------
                         owner    5542    39.31    15.56     0.00    29.65    42.87    50.85    94.09
                      -------------------------------------------------------------------------------
                      
                      -&gt; owner_group=1
                                                              -------------- Quantiles --------------
                      Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
                      -------------------------------------------------------------------------------
                         owner    4965    66.98     7.69     2.42    61.90    67.81    73.46    99.95
                      -------------------------------------------------------------------------------
                      
                      *Treated group (owner_group 1) has a higher percentage of ownership than control group (owner_group 0)
                      *Trust everything is correct till here hence proceeding with regression
                      
                      **********************Regression*********************************************
                      xtreg cash_ta_w i.owner_group##i.post size_w nfa_ta_w lever_w trade_credit_ta_w sales_grow_w roa_w pbfinal cfo_
                      &gt; ta_w rdcc2_ta_w div_ta_w nw_ta_w i.year, fe vce(robust)
                      note: 1.owner_group omitted because of collinearity
                      note: 2019.year omitted because of collinearity
                      
                      Fixed-effects (within) regression               Number of obs     =      8,485
                      Group variable: id                              Number of groups  =      1,814
                      
                      R-sq:                                           Obs per group:
                           within  = 0.0810                                         min =          1
                           between = 0.1039                                         avg =        4.7
                           overall = 0.1015                                         max =          6
                      
                                                                      F(17,1813)        =       9.34
                      corr(u_i, Xb)  = -0.0132                        Prob &gt; F          =     0.0000
                      
                                                            (Std. Err. adjusted for 1,814 clusters in id)
                      -----------------------------------------------------------------------------------
                                        |               Robust
                              cash_ta_w |      Coef.   Std. Err.      t    P&gt;|t|     [95% Conf. Interval]
                      ------------------+----------------------------------------------------------------
                          1.owner_group |          0  (omitted)
                                 1.post |  -.0036867   .0027742    -1.33   0.184    -.0091276    .0017542
                                        |
                      owner_group#post |
                                   1 1  |  -.0029889   .0028072    -1.06   0.287    -.0084946    .0025168
                                        |
                                 size_w |    .003986   .0048166     0.83   0.408    -.0054606    .0134326
                               nfa_ta_w |  -.1260332   .0141154    -8.93   0.000    -.1537174   -.0983491
                                lever_w |   .0513303   .0156838     3.27   0.001       .02057    .0820906
                      trade_credit_ta_w |  -.0764345   .0118204    -6.47   0.000    -.0996175   -.0532515
                           sales_grow_w |   .0017605   .0007836     2.25   0.025     .0002236    .0032974
                                  roa_w |  -.0148786   .0177108    -0.84   0.401    -.0496144    .0198571
                                pbfinal |   .0016718   .0007772     2.15   0.032     .0001476     .003196
                               cfo_ta_w |   .0642863   .0095625     6.72   0.000     .0455316    .0830411
                             rdcc2_ta_w |  -.1687321   .1958021    -0.86   0.389    -.5527536    .2152893
                               div_ta_w |   .0019618   .0022745     0.86   0.389    -.0024992    .0064227
                                nw_ta_w |   .0985509   .0137832     7.15   0.000     .0715183    .1255835
                                        |
                                   year |
                                  2015  |   .0002953   .0014899     0.20   0.843    -.0026268    .0032174
                                  2016  |  -.0008791   .0016552    -0.53   0.595    -.0041253    .0023672
                                  2017  |   .0041283   .0018694     2.21   0.027      .000462    .0077946
                                  2018  |   .0024727   .0016666     1.48   0.138     -.000796    .0057415
                                  2019  |          0  (omitted)
                                        |
                                  _cons |   .0050101   .0395974     0.13   0.899    -.0726512    .0826715
                      ------------------+----------------------------------------------------------------
                                sigma_u |  .08184581
                                sigma_e |  .04456335
                                    rho |  .77133251   (fraction of variance due to u_i)
                      -----------------------------------------------------------------------------------
                      Is my research design the correct one? Also, are my codes correct? Can I say that post-regulation has no effect on the dependent variable of the treated group?
                      I have made sure to include all here and if anyone could help me here, it will be very important for me at this juncture?

                      Comment


                      • #26
                        Carlo Lazzaro in your post, (https://www.statalist.org/forums/for...79#post1450179 ), you have demonsterated fixed effects wont work. However, in the above example, fixed effects are there. Is there anything wrong in the above? Is my DID specification wrong with respect to classification, commands I gave etc

                        Comment


                        • #27
                          Ial:
                          I cannot say whether there's somethng wrong in your approach.
                          What I previously stated was that (as obviously expected) -fe- won't work when all predictors are time-invariant:
                          Code:
                           use "http://www.stata-press.com/data/r15/nlswork.dta"
                          . xtreg ln_wage i.race##i.birth_yr, fe
                          Last edited by Carlo Lazzaro; 09 Mar 2021, 09:51.
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #28
                            Carlo Lazzaro Thanks for the prompt reply.
                            HTML Code:
                            -fe- won't work when all predictors are time-invariant:
                            . I agree unlike in my setup.

                            Comment


                            • #29
                              Clyde Schechter. Sorry for tagging you. As I have banked upon your (as well as Carlo's )resources heavily, I dont want to miss any of your comments. Following your post #19, for parallel trend assumption check I run the following codes
                              Code:
                              gen did=post*owner_group
                              collapse (mean) cash_ta_w if did == 0, by(year owner_group)
                              reshape wide cash_ta_w, i(year) j(owner_group)
                              graph twoway connect cash_ta_w* year
                              and I got the following graph Graph.gph
                              Is my assumption valid from the graph?
                              Attached Files
                              Last edited by lal mohan kumar; 09 Mar 2021, 11:54.

                              Comment


                              • #30
                                Concerning the model in #25, the code appears to be a correct implementation of what is described in its comments. I do have concerns about the design, however. Defining the treatment group by an average of values of ownership share being greater than the median for all firms in your sample is questionable under the best of circumstances (Googel Harrell dichotomania). It is even more concerning here: are you certain that year-on-year ownership is not affected by the intervention you are trying to study?

                                Concerning your parallel trends issue, it looks roughly valid. With only three years of pre-intervention data, it's a little hard to really draw a firm conclusion. But it looks fair. I would call it plausible, maybe even persuasive, but not convincing. (But almost nothing would be convincing based on just three years.)

                                Comment

                                Working...
                                X