Interpretation differences in differences with control variables

Maximilian Oechsner

Join Date: Jun 2017

Posts: 11
#1

Interpretation differences in differences with control variables

15 Jun 2017, 03:28

Dear all,

I ran a differences-in-differences regression with control variables to estimate the effect of asset purchases by the European Central Bank (ECB) on eligible securities vs non-eligible securities. In more detail, my treatment group consists of covered bonds, for which the central bank announced a purchase program (treatment, monthly purchases in significant size) on 15.10.2014 (event date). The control group consits of government bonds (control group), which have not been bought under the same program. For each group, I have a sample of approximately 30-40 individual bonds and I set periods as follows: Before period: 02.12.2013-14.10.2014, after period including event date 15.10.2014-20.04.2017. My dependent variable is the relative bid ask spread, a measure of liquidity (lower bid ask spread = more liquid / higher bid ask spread = less liquid). In addition, I included bond specific control variables such as the swap spread, age or time to maturity (I take the log for both).

My regression equation:

D : Treatment =1 or not =0
T : Time =1 after, =0 before
D*T: Interaction Term

Can you please help me on interpreting the results? According to the attached output, all coefficients of interest (time, treated, DiD) are highly significant.
What does the time and treated coefficent mean? For the latter, is it that these bonds became more liquid (negative coefficent --> lower bid ask spread --> more liquid)
And what about the DiD coefficient?

I would appreciate an explanation since I could not find an elaborate answer so far.

Best,
Max
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17678

15 Jun 2017, 03:42

Maximilian:
- is the -did- coefficient the interaction between -time- and -treated-? Or else?
- you problem in getting the meanng of the abovementioned coefficients is ampified by the fact that you did not use -fvvarlist- for creating categorical variables and interactions. You can also add the-allbaselevels- option to get things even clearer, like in the following toy-example:

Code:

. sysuse auto.dta
(1978 Automobile Data)

. reg price i.foreign##c.mpg, allbaselevels

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(3, 70)        =      9.48
       Model |   183435281         3  61145093.6   Prob > F        =    0.0000
    Residual |   451630115        70  6451858.79   R-squared       =    0.2888
-------------+----------------------------------   Adj R-squared   =    0.2584
       Total |   635065396        73  8699525.97   Root MSE        =    2540.1

-------------------------------------------------------------------------------
        price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
      foreign |
    Domestic  |          0  (base)
     Foreign  |  -13.58741   2634.664    -0.01   0.996    -5268.258    5241.084
              |
          mpg |  -329.2551   74.98545    -4.39   0.000    -478.8088   -179.7013
              |
foreign#c.mpg |
    Domestic  |          0  (base)
     Foreign  |   78.88826   112.4812     0.70   0.485    -145.4485     303.225
              |
        _cons |   12600.54   1527.888     8.25   0.000     9553.261    15647.81
-------------------------------------------------------------------------------

Last but not least, please use CODE delimiters to post what you typed and what Stata geve you back (as per FAQ). Thanks.

Kind regards,
Carlo
(Stata 19.0)

Comment

Maximilian Oechsner

Join Date: Jun 2017

Posts: 11
#3

15 Jun 2017, 03:57

Hi Carlo,

thank you for your fast reply. The interaction term is indeed the interaction between the -time- and -treated- dummy. I have never used the -fvvarlist- but build up my codes based on the diff-in-diff STATA manual of Oscar Torres‐Reyna, Princeton (http://www.princeton.edu/~otorres/DID101.pdf). Therefore, I guess that the interaction term is specified in the correct way.

Can you tell me how I can interpret the three coefficients (treated, time and DiD). The negative sign of the DiD coefficient should mean that the dependet variable (bid ask spread) was affected more negatively for the treatment group as compared to the control group? And how about the coefficents of the treated and time dummy?

Best,
Maximilian
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#4

15 Jun 2017, 04:57

Maximilian:
thanks for providing further details.
The source you quoted reports the use -fvvarlist- at page 4.
q1) -did- treatment has a negative effect on the -depvar- (that is, when the level of -treated- =1 and level of -time-=1);;
q2) -treated- has a negative effect on the -depvar- when -time- =0 (that is, when the level of -treated- =1 and level of -time-=0);
q3) -time- has a positive effect on the -depvar- when -treated- =0 (that is, when the level of -time- =1 and level of -treated-=0).

All effects discussed in q1)-q3) reach statistical significance at 5% (arbitrary) level.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maximilian Oechsner

Join Date: Jun 2017

Posts: 11
#5

15 Jun 2017, 06:21

Thank you for the answer. I still haven't figured out exactly how I can interpret the above displayed coefficients...

Thus, three questions in the same order as your answers are:

In my case, it is about liquidity. Both groups were not treated (-treated- = 0 ) in the before period (-time- = 0) and the treatment group ( -treated- = 1) gets treated
in the after period (-time- = 1) /// control group remains -treated- = 0 in after period (-time- =1).

A positive coefficient means the dependent var becomes less liquid, a negative sign means it becomes more liquid.

Therefore:
Q1) The -did- coefficient says that the treatment group ( -treated-=1) becomes more liquid in the after period (-time- = 1) compared to the control group (-treated- = 0 , -time- =1 )?
Q2) -Treated- coefficient: The treatment group (-treated- = 1) was more liquid (negative sign) in the before period (-time- = 0) than the control group (-treated- = 0, -time- =0)?
Q3) -time- coefficient: The control group (-treated- = 0) became less liquid (positive sign) in the after period (-time- = 1) than in the before period (-time- = 0, -treated- =1)?

Best,
Maximilian
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#6

15 Jun 2017, 07:14

Maximilian:
when adjusted for the remaining predictors:
q1) -did- coefficient : represents the difference in the changes between the two groups over time
q2) -Treated- coefficient: represents the differences between the two groups at time o.
q3) -time- coefficient: represents the time trend in the control group.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maximilian Oechsner

Join Date: Jun 2017

Posts: 11
#7

17 Jun 2017, 09:55

Hi Carlo,

I have figured it out in the meantime how to interpret my results based on your answer above, thanks for that! I would like to show my results also graphically in my thesis and used the Stata command

Code:

mean RELATIVE, over(time treated)

to calculate the means before and after the intervention. RELATIVE is my dependent variable (Y). Then I did the calculation in Excel for the differences in differences estimator as follows:
((Mean of Y: -treated- =1, -time-=1) - (Mean of Y: -treated- = 1, -time-=0)) - ((Mean of Y: -treated-=0, -time-=1) - (Mean of Y: -treated-=0, -time-=0)
Surprisingly, the result was different to the estimated coefficients in the regression (Please see both attached). Neither the coefficient for -time- =0.00513 suits the calculation of the development of the control group: 0.003349 (-treated-=0, -time-=0) to 0.002794 (-treated-=0, -time-=1), nor does the estimated coefficient for the interaction term (DiD). Coefficient: -0.0001799 / calculation DiD: -0.00015

1) My regression results:

2) My calculations:

Hence, I would like to know whether the numbers have to be the same and especially where my mistake could be? I also adjusted he number of observations, so that I have the same number of observations for my regression as well as for calculating the means. The latter should thus not be the source anymore.

Thank you and have a nice evening,
Max
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#8

17 Jun 2017, 10:13

Max:
you seemingly skipped the fact that the conditional mean you've got from Stata is adjusted for the remaining predictors (I mean those diferent from -treat- and -time-) that you did not (and probably could not successfully) consider in your spreadsheet approach.
As an aside, I strongòy discourage trying to repeat Stata statistical analyses with a spreadsheet. it's at best a waste of time.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Maximilian Oechsner

Join Date: Jun 2017

Posts: 11
#9

17 Jun 2017, 13:42

Thank you. You are right, I did not consider them in my Excel calculation. I just calculated the means of the treatment group before (-treated-=1 / -time-=0) and after the intervention (-treated-=1 / -time-=0) and the means of the control group before (-treated-=0 / -time- = 0) and after (-treated-=0 / -time-=1) in Stata. Afterwards, I calculated in Excel as already stated. Do you have suggestion for me how to calculate the -DiD- coefficient correctly and to show it in a graph? The coefficient should be at least of the same direction (positive or negative) as my calculation in Excel. This would be really helpful for my thesis.

I have thought about two ways:
1. drop all variables except for my dependent variable (Y) -RELATIVE-, -treated- and -time- so that Stata will not adjust for the remaining predictors as you said?
2. Do the calculation manually in Excel (which I definitely do not prefer since the spreadsheet with raw data is very large)

Best,
Max
Comment
Patrick Dickson

Join Date: Jun 2017

Posts: 12
#10

17 Jun 2017, 15:29

Maximilian:

The regressions with and without control variables are different things. The Excel calculation you describe is an unadjusted comparison of a factorial interaction, the model with controls is an adjusted comparison. It is entirely plausible that the treatment effect (the coefficient on the interaction term) would change signs and size with and without controlling for these other covariates. Therefore, you should definitely expect to see a difference between the Excel model (unadjusted) and the State model (adjusted). As Carlo said, there is no good reason for using Excel though - the coefficient is available from an unadjusted Stata regression if that is what you want, and you will also be able to describe the uncertainty around that estimate using the confidence intervals etc that Stata will give you, but which your manual calculation in Excel will not (or at least not without some tedious effort).
Comment
Maximilian Oechsner

Join Date: Jun 2017

Posts: 11
#11

17 Jun 2017, 16:23

Hi Patrick,

thank you for your elaborate answer! I am aware of the difference in diff-in-diff regressions with and without controls. Hence, I included controls because the result would be definitely false otherwise. In Excel, I just took the means before and after for both groups obtained from Stata (with the same code as stated above) and did the calculation in Excel based on these numbers. My aim is still to show graphically (in addition to describing the regression outcome) the effects on the treatment group (-treated-=1) in comparison to the control group (-treated- = 0) before (-time-=0) and after the event (-time-=1). According to what we have learned in an econometrics class, I would like to create a chart that looks like this :

In this example, taken from wikipedia, the outcome in the treatment group is described by the line for P while the line for S describes the same for the control group. "DID therefore calculates the "normal" difference in the outcome variable between the two groups (the difference that would still exist if neither group experienced the treatment), represented by the dotted line Q"

If you could help me on solving this issue, I would highly appreciate it! My questions are therefore:

1.) How can I calculate the needed values to create a graph like this? Needed: mean of treatment group before (-treated-=1 / -time-=0) and after (-treated-=1 / -time-=1) as well as the mean of the control group before (-treated-=0 / -time-=0) and after (-treated-=0 / -time- =1).

As an approach: Is it still possible to calculate the above mentioned means in Stata if I drop all other variables except for the variable of interest? As far as I understand Carlo, the mean that Stata calculates (if you do not drop them!) is adjusted for these remaining variables, which I did not know.

2.) Do these values have to display exactly the regression coefficients? For example, does the estimated coefficient for the variable -time- has to be equal to the difference in the mean of the control group before (-treated-=0 / -time-=0) and after (-treated-=0 / -time- = 1). The same question holds for the -DiD- coefficient.

Best,
Max
Comment
Maximilian Oechsner

Join Date: Jun 2017

Posts: 11
#12

18 Jun 2017, 03:31

Hi all,

I found something interesting in the meantime, which helps me to create thegraph I guess and could be of interest for others who are following this lively discussion.

Erick Gong, an assistant professor from the University of Middlebury, held a presentation on differences in differences during his time at the University of California, Berkley. In his presenetation, he describes the calculations to get the four points needed for the graph (means of treatment and control group, each before and after the intervention) based on the estimated regression coefficients.

See the following link: https://www.ocf.berkeley.edu/~garret..._slides09.pptx

This is a screenshot of slide 28, which is of interest to me and describes the calculations based on estimated regression coefficients:

Let me know what you think...

Best and have a nice Sunday,
Max
Comment
Patrick Dickson

Join Date: Jun 2017

Posts: 12
#13

18 Jun 2017, 03:56

A connected line (or perhaps just a simple line ) graph will give you what you need. Below, I've used generic titles for variables and assume you want a vertical line indicating the before and after cut-off point at time 13 in your time variable, so replace this with whatever indicates the before/after epoch. Search -

Code:

help textbox

to add in text as required.

Code:

twoway connected outcome_variable time_period,by(treatment) ytitle("Outcome") xtitle("Time period") xline(13), lstyle(foreground))

PS re your other questions above - everything you need to create the graphs and interpret coefficients is in the Stata regression and your data.

Last edited by Patrick Dickson; 18 Jun 2017, 03:59.
Comment
Maximilian Oechsner

Join Date: Jun 2017

Posts: 11
#14

23 Jun 2017, 04:48

Hi all,

first of all thank you for your most recent answer Patrick, I have not figured it out yet because I am busy with robustness checks, which leads me to my next questions:

I could come up with the following tests so far:
1. Regress the -dependent variable- (Y) on the -time- dummy, but only for the control group to see whether the control group was affected by the treatment: endogeneity check was succesfull in my case

The code looks as follows:

Code:

reg RELATIVE time treated did SWAP TIMETOMATURITY AGE log_SIZE VSTOXX RETURN, robust

Y: -RELATIVE-
X: -time- dummy
Rest: controls

2. Moreover, I did a placebo test (assume the intervention took place at a random date in the pre-treatment period: -did- coefficient should be insignificant). I set the event date to be approx. one year before the actual onset of the intervention but it was still significant and therefore not successfull unfortunately.

My questions are thus:
Q1) Which additional tests can I do?
Q2) How can I overcome then if they reveal that my model is not robust?
Q3) How can I overcome the placebo test outcome or how shall I rate the outcome?

I am looking forward to your answers and appreciate your elaborate help as always!

Best,
Max
Comment

Announcement