Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to control for something within a Diff in Diff Graph

    Hello everybody,

    I am currently trying to create the usual Diff in Diff Graph e.g:

    Click image for larger version

Name:	Screenshot 2019-06-28 at 17.27.44.png
Views:	2
Size:	35.6 KB
ID:	1505329


    My setting:
    • I have panel data of about 4 million German companies.
    • I regress the yearly log total assets growth of those companies on a business tax levy increase dummy of the municipal the respective firm is located in. The dummy is one if the municipal increased the business tax levy this year
    • I am expecting that there is a negative effect on log total assets growth of a firm within the same year and the years after the municipal increased the tax.
    • I have 16 years, 11.000 municipals and 1500 tax increase events. So I have to standardize the x-axis to event time instead of years.
    I receive the following results in my regression:

    Click image for larger version

Name:	Screenshot 2019-06-28 at 17.51.31.png
Views:	2
Size:	125.9 KB
ID:	1505330


    Using the following Code:

    Code:
    //Firm  and state  control variables
        xtset bvd_id year
        local independent "F3.hebesatzIncreaseDummy F2.hebesatzIncreaseDummy F1.hebesatzIncreaseDummy hebesatzIncreaseDummy L1.hebesatzIncreaseDummy L2.hebesatzIncreaseDummy L3.hebesatzIncreaseDummy"
        local firmControls "L1.assets_total_million L1.ratio_leverage L1.ratio_leverage_change age" // assets_total_log age
        local stateControls "L1.gspGrowthRate L1.gspGrowthRate_change L1.unemploymentRate L1.unemploymentRate_change" //i.municipalAssetsTotalQuantil population 
            
    //Regression    
        qui eststo spezifikation1: reg growth_assets `independent' `firmControls' `stateControls', vce(cluster statekz)
        qui eststo spezifikation2: reghdfe growth_assets `independent' `firmControls' `stateControls', absorb(i.year i.industry_sic_2_digit) vce(cluster statekz)
        qui eststo spezifikation3: reghdfe growth_assets `independent' `firmControls' `stateControls', absorb(i.year##i.industry_sic_2_digit) vce(cluster statekz)
        qui eststo spezifikation4: reghdfe growth_assets `independent' `firmControls' `stateControls', absorb(i.year##i.industry_sic_2_digit i.municipalId) vce(cluster statekz)
    
    //Regression output    
        esttab spezifikation1 spezifikation2 spezifikation3 spezifikation4, b("%-8.5f") t ///
        stats(N r2_a, labels("N" "Adj. R-Square") fmt("%-8.0fc" "%-8.3f")) ///
        varwidth(22)  ///
        nonumbers mtitles("No FE" "Year Ind" "Year##Ind" "Year##Ind Mun" "Model 5" "Model 6" "Model 7" "Model 8") ///
        nonotes addnote("t-values werden in Klammern angegeben.")


    But somehow my Diff in Diff graph looks like this:

    Click image for larger version

Name:	Screenshot 2019-06-28 at 17.54.27.png
Views:	2
Size:	152.7 KB
ID:	1505331


    I am trying to understand why the log total assets growth mean of the control group also decreases after the event. I am using the following code to generate the graph. I am new to Stata so please forgive me if there are more effective ways to achieve the same (actually please tell me if there are).

    Code:
    foreach group in 1 0 {
        foreach time in "L5" "L4" "L3" "L2" "L1" "L0" "F1" "F2" "F3" "F4" "F5" { 
           qui: sum `time'.growth_assets if hebesatzIncreaseDummy == `group', de
            scalar group`group'Time`time' = r(mean)
        }    
    }
    
    //Clear Dataset
    drop _all
    
    //Create Graph Dataset out of Scalars
    set obs 22
    
    gen treated = 0
    gen eventtime = 0
    gen growth_assets_mean = 0
    
    scalar obs = 1
    foreach group in 1 0 {
        foreach time in "L5" "L4" "L3" "L2" "L1" "L0" "F1" "F2" "F3" "F4" "F5" {
    
            replace treated = `group' in `=obs'
            
            replace eventtime = real(substr("`time'", 2, 2)) in `=obs'
            if (substr("`time'", 1, 1) == "L") {
                replace eventtime = eventtime *-1 in `=obs'
            }
                
            replace growth_assets_mean = group`group'Time`time' in `=obs'
            
            scalar obs = `=obs' + 1
        }    
    }
    
    //Graph
    twoway (line growth_assets_mean eventtime if treated == 1) (line growth_assets_mean eventtime if treated == 0), legend(lab (1 "Treated firms") lab(2 "Non treated firms")) ylabel(#6) xlabel(#11, grid)
    I think I have to somehow control for years. Since most of the tax increases happened in more recent years and the log total assets growth mean also declined in more recent years. But I don't know how to do that.

    I would be very thankful for any help!

    Best regards,
    Andres
    Attached Files

  • #2
    Hello Andreas,

    A few suggestions.

    1. The way your data is set up, your graph is likely not displaying the concepts that you intended. If I understand correctly, your treatment variable (hebesatzIncreaseDummy) is set equal to 1 in the year when the tax increase occurs, and 0 otherwise. But this means that each "treated" firm is also being included in the mean for "control" firms in the periods after treatment. In other words, suppose the data looks like this:
    Code:
    clear
    input firm_id year treated
    1 2005 0
    1 2006 1
    1 2007 0
    2 2005 0
    2 2006 0
    2 2007 0
    end
    We have two firms, where firm 1 is treated in year 2006 and firm 2 is never treated. The problem is that with this data structure, your graph is incorrectly computing the control mean as a weird mix of firm types. For example, the time -1 "control" mean is the average of observations 2, 4, and 5 (because those are the lagged values of rows for which we have treated==0). Note that this supposedly "control" mean includes the value for the treated firm in row 2! Thus, it is no surprise that the two lines almost overlap--they are essentially plotting the same data, just moved over by a year. Such a graph tells you nothing about the actual treatment effect.

    2. To construct a diff-in-diff graph with time measured in event time, you can do either of the following:
    a. If there is a single treatment time (like year==2006 above), you would do
    Code:
    gen eventtime = year - 2006
    xtset // to sort
    by firm_id: egen ever_treated = max(treated)
    collapse y, by(eventtime ever_treated)
    and then graph the resulting mean values of y. BUT this only works if you have only one time period of treatment. With multiple treatment times, you can't define the "event time" for firms that are never treated--so you can't plot a "control" line. (Unless you do fancy stuff with matched pairs, as discussed here: https://www.statalist.org/forums/for...atment-periods).

    b. Construct an event study style graph rather than a diff-in-diff.
    Code:
    gen treated_time = year if treated==1
    xtset // to sort
    by firm_id: egen treat_year = max(treated_time)
    gen eventtime = year - treat_year
    collapse y, by(eventtime)
    This shows outcomes relative to an event time for everyone, but includes only one line on the graph. The control firms that are never treated will not be plotted (since we don't know their counterfactual time of treatment).

    You typically want a graph of the type (b) style, but controlling for other covariates (like time, as you mention).

    3. To get an event study graph with controls, just plot the coefficients from your regression. For example:
    Code:
    ssc install coefplot
    coefplot spezifikation4, vertical keep(*hebesatzIncreaseDummy) xtitle(Event Time) ///
         coeflab(F2.hebesatzIncreaseDummy ="-2" F.hebesatzIncreaseDummy="-1" hebesatzIncreaseDummy=0 L.hebesatzIncreaseDummy="1" L2.hebesatzIncreaseDummy="2")
    will plot the coefficients and label them according to time periods. This is telling you how the dependent variable changes on average before and after the tax increase, controlling for time and industry fixed effects. The fact that you see a negative coefficient in time -1 is a little concerning--it implies firm assets were going down in that industry/municipality even before the tax increase happened. But this is the analogue to the regular DiD graph for differing treatment times.

    4. Your code to make the original graph seems to be written from a traditional programming background (working through a loop and saving results in scalars). Stata can do this, but using vectorized code along with collapse and reshape will save you a lot of work.

    (Note: the graph generated with this code will still be incorrect, because you still have the problem of a treatment variable that turns on at different times, so the control line is not meaningful. But I will ignore that just so you can see a more standard Stata way to construct the same graph).

    Code:
    //first get the lags - collapse doesn't recognize L. format variables, so we will make them into new columns first
    foreach time in "L5" "L4" "L3" "L2" "L1" "L0" "F1" "F2" "F3" "F4" "F5" {      
         gen y_`time' = `time'.growth_assets_mean
    }
    
    //now use collapse and reshape to get the graph data, and plot
    preserve
    rename hebesatzIncreaseDummy treated
    collapse (mean) y_*, by(treated)
    
    reshape long y_L@ y_F@, i(treated) j(eventtime)
    reshape long y_@, i(treated eventtime) j(lf) string
    
    replace eventtime = eventtime * -1 if lf=="L" //note: vectorized rather than part of a loop
    drop if lf=="F" & eventtime==0
    drop lf
    rename y_ growth_assets_mean
    
    sort  treated eventtime
    
    //graph code is the same, I just split the lines for legibility
    twoway (line growth_assets_mean eventtime if treated == 1) ///
         (line growth_assets_mean eventtime if treated == 0), ///
         legend(lab (1 "Treated firms") lab(2 "Non treated firms")) ylabel(#6) xlabel(#11, grid)
    
    restore //leave this out if you don't want your old data back right away
    This gets you the same graph in a more efficient way.


    4. I would recommend that you try running the model with firm fixed effects. You have firm-level information, so why not use it?


    Hope that helps.

    Comment


    • #3
      Dear Kye,

      I thank you for your detailed answer. Your remarks have really helped me a lot. I'm currently busy adjusting the graph and model and if that's okay I'll probably get back to you soon with a few follow-up questions.

      Best regards and many thanks again
      Andreas

      Comment

      Working...
      X