Interpreting difference-in-differences regression result

Daniel Hansson

Join Date: Oct 2015

Posts: 2
#1

Interpreting difference-in-differences regression result

25 Nov 2015, 16:41

I have performed a difference-in-differences analysis but I'm not sure how to interpret the results. I have a regression on the form:
Y = α + β1(treatment) + β2(time) + β3(treatment∗time)

The thing is that neither coefficient is significant but the F-test shows significance on the 0.01-level. I guess this has to do with correlation between the independent variables? If I regress Y on only B1 or B3 I get significant coefficients. How can I interpret these results? I'm hesitant to just remove variables to get a significant coefficient. Do I just leave it as is and conclude that there are no significant coefficients? I feel I need to explain the significant results from the F-test. Any suggestions?

Really appreciate it. Thanks in advance.
Tags: None

1 like
Roman Mostazir

Join Date: Apr 2014

Posts: 876
#2

25 Nov 2015, 18:37

Daniel, if you could present the results and the stata codes you used, perhaps it would have been easier to help you. Please use the code delimiters (#) sign from the right hand side for posting any stata output if you further decide to do so.

Regarding your problem, if you do not have interaction effect significant, perhaps you have absence of evidence of an interaction effect (B3). Try 'marginsplot' after 'margins' to see how the interaction plot looks like. Also try B1 and B2 both in the model without interaction to see if any of them are significant (I suppose at least one of them will be). Note, you cannot regress Y only on B3. B3 in order to be in the model requires that you have B1 and B2 both in the model.

Roman
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#3

26 Nov 2015, 02:00

Daniel:
as an aside to Roman's helpful insights, please consider that a significant F-test with non.significant coefficients may sound as a warning chime for multicollinearity.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#4

27 Nov 2015, 10:20

In interpreting results like this, it is important to remember what each coefficient means. I'll assume that your treatment variable is coded 1 = active treatment/0 = control, and that your time variable is also a dichotomy with 0 = era prior to intervention and 1 = era following intervention. So your regression is designed to estimate difference in differences.

The coefficient of the treatment variable, β1, is the estimated mean difference in Y between the treatment and control groups prior to the intervention: it represents whatever "baseline" differences existed between the groups before the intervention was applied to the control group.

β2 is the expected mean change in outcome from before to after the onset of the intervention era among the control group. It reflects, if you will, the pure effect of the passage of time in the absence of the actual intervention.

β3 by itself is the difference in differences estimator. In most contexts, it is β3 that is the focus of interest. It tells us whether the expected mean change in outcome from before to after was different in the two groups. (That would typically be the hallmark of an effective intervention, assuming adequate power, etc.)

To get the estimated mean difference in Y between the treatment and control groups after the intervention, you need to look at β1 + β3. It is possible that you will find that β1 + β3 is significantly different from zero, even though neither β1, nor β3 by itself is.

Occasionally, one wants to test the null hypothesis that nothing changed from before to after intervention in either group and both groups had the same expected outcomes at baseline. In that case, you would rely on the overall regression F-test--and your data seem to reject that null hypothesis.

If you used factor variable notation (see -help fvvarlist- if you did not) in your regression command, you can often get a better feel for your results by running the -margins- command after your regression to see the expected mean outcomes in each group in both time periods (-margins treatment#time-). And the change from before to after in each group can be most easily seen with -margins, dydx(time) at(treatment = (0 1))-. This is usually easier than trying to add the appropriate combinations of coefficients.

The other thing to think about, is you don't tell us anything about your study design, nor show us the specific regression commands you used. If you have longitudinal data, so that the units of analysis in each group during the pre-intervention era continue to be observed in the post-intervention era, then your analysis must account for these repeated, non-independent observations by using either fixed-effects or random-effects regression. The standard errors (and hence the p-values) of a simple OLS regression would be incorrect with longitudinal data.
6 likes
Comment
Jim Nieb

Join Date: Sep 2016

Posts: 9
#5

23 Sep 2016, 13:33

Greetings everyone. This has been a helpful thread and I have a question. I am estimating a DiD model like Clyde described above, with a control time series and treatment time series before and after an event. The monthly data represent production and I'm wondering how to correctly interpret the DiD estimator (β3 in the above discussion). For example, if it is -10 (statistically significant), presumably this means that average monthly production is lower by 10 units for the treatment firm after the event vis-a-vis the control firm. Does this hold true for every month in the post period, so that one could multiply it by the number of months in the post period for a total effect (i.e., if there are 30 months in the post period, then 10 x 30 = 300)? Thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#6

23 Sep 2016, 14:09

Your question can't be answered fully without seeing the output of the model you actually ran, so we can see the way that time is represented in the model. Please provide the exact command used and the exact output you got from Stata. This is best done by copy/pasting directly from Stata's Results window or your log file directly into a code block here on the forum. (See FAQ #12 for instructions on creating a code block if you don't know how that works.)

But in general terms, I think you have it wrong. The interaction coefficient does not represent the difference in outcome between groups in the post period. Rather, as we go from pre- to post- treatment, each group experiences some change (possibly 0, but typically not) in the outcome. The interaction coefficient represents the difference between those changes. So, for example, if in the control group production went up by 5 units between the pre- and post- periods, an interaction of coefficient of 10 would mean that in the intervention group production went up by 5+10 = 15 units.
Comment
Jim Nieb

Join Date: Sep 2016

Posts: 9
#7

23 Sep 2016, 15:39

Thanks for the quick reply, Clyde. Please see Table below. I agree that the interaction coefficient (-10) represents the difference from pre- to post-event that each group experienced, and that the treatment firm experienced lower growth going from pre- to post-event (10) than did the control firm (20). The regression model also returns -10 for the interaction coefficient. The estimation sample is 30 months of data pre-event and 30 months post-event. My confusion at the moment is, assuming that the treatment firm should have experienced the same growth in output as the control firm going from pre- to post-event (i.e., +10), does this imply that for EACH MONTH the treatment firm's output should have been higher by10 units (a scale effect) so that overall, it "lost" 300 units (i.e., 10 x 30)? Thanks again for your time on this.

Attached Files
Comment
Jim Nieb

Join Date: Sep 2016

Posts: 9
#8

23 Sep 2016, 15:40

Oops. Sorry for multiple attachments of same table.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#9

23 Sep 2016, 18:46

Again, without seeing the actual regression model you used and knowing how time is represented in your equation, I can't answer your question. Whether we are talking about aggregate changes in outcome or changes in rates of outcome (that could then be multiplied by duration) depends on how time appears in the regression equation and the specific command used.
Comment
Jim Nieb

Join Date: Sep 2016

Posts: 9
#10

25 Sep 2016, 15:12

Here you go with the actual results attached (along with the summary DiD table) . I ran the same model two different ways (coefficient of interest is the -19.62). I "stacked" my data with the first "panel" being the treatment data and the second panel being the control data. In particular, data are monthly running from Jan 2009 through Dec 2015 in both panels (84 obs in each panel). I -tsset obs in the data before stacking it via -stack. Whether or not I -tsset obs, the same DiD results obtain. Thanks again Clyde for helping me here.

Attached Files

Stata output.pdf (16.6 KB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#11

25 Sep 2016, 16:22

The table shown in the post doesn't help me, but the regression output shown in the PDF does. In your model time is represented by a dichotomous variable dum_05_12 which, I'm guessing is 1 between 2005-2012 and 0 in other years. So your regression coefficient dimensions are sales, not sales per unit of time. The expected sales in the treatment group were 54.75 units higher at each observation in the baseline period than those of the control group. Following the onset of the intervention, that difference dropped by 19.62 units to 35.13 at each observation. Now, if you want to aggregate sales over time, if one group is 35 units higher than the other at each time period and there are 10 time periods then the total expected difference is 10x35 = 350 units. But it is still just a difference of 35 units in each time period: the difference in each time period remains the same in this model.
Comment
Jim Nieb

Join Date: Sep 2016

Posts: 9
#12

25 Sep 2016, 16:59

Thanks Clyde. This is helpful. Sorry for my ambiguity: dum_05_12 = 1 starting May 2012; 0 earlier. Cheers!
Comment
HENIA BEN AMOR

Join Date: Sep 2016

Posts: 1
#13

29 Sep 2016, 07:29

I work on inflation targeting and I use econometric method as the method "differences in differences", I like some document that allows me to understand the way step by step knowing that i will use Stata. thanks
Comment

Asawari Sathe

Join Date: Dec 2016
Posts: 2

#14

20 Dec 2016, 00:33

Hi , I am working on a paper where I am trying to find out the period effect on log weekly wages of natives and immigrants. I use the recent recession to divide my pooled cross -section data into pre and post recession groups. I use DID where the treatment is that if a person is a native or immigrant and the time periods observed are before and after recession. I run a regression with the recession and immigrant dummies and the recession and immigrant interaction variable. Apart from these the other explanatory variables are state umeployment, skill dummy, education and cohort vector. The results of the regression are as below. I need some help in interpreting the coefficient on recession and immigrant indicator.

Regression by Census Cross Section
Variables	DID-Recession & Immigrant	DID-Recession & Skill	DDD-Recession, Skill, Immigrant
Recession _Immigrant Interaction Term	-0.0779***		-0.1669***
-0.01384		-0.01688
Immigrant Indicator	0.0931	-0.1652***	0.1027
-0.06071	-0.03617	-0.06784
Recession Indicator	0.1532***	0.2345***	0.2505***
-0.0016	-0.00698	-0.01826
Skill Dummy	0.5998***	0.6603***	0.6615***
-0.03849	-0.04262	-0.04365
Age	0.0647***	0.0644***	0.0644***
-0.00162	-0.00156	-0.00161
Age-2nd Degree Poly	-0.0023***	-0.0022***	-0.0022***
-0.00006	-0.00005	-0.00006
Age-3rd Degree Poly	0.0000***	0.0000***	0.0000***
0	0	0
Unemployment-State level	0.0028	0.0029	0.0026
-0.00226	-0.0019	-0.00221
<1950 Arrivals	-0.0275	0.2149***	-0.0489
-0.05844	-0.03754	-0.04683
1950-59 Arrivals	0.0238	0.2057***	0.0167
-0.03927	-0.02302	-0.04284
1960-64 Arrivals	-0.0159	0.1688***	-0.024
-0.04128	-0.02475	-0.04432
1965-69 Arrivals	-0.0569	0.1246***	-0.0635
-0.04256	-0.02622	-0.04744
1970-74 Arrivals	-0.1120*	0.0778*	-0.1172*
-0.04567	-0.0287	-0.05238
1975-79 Arrivals	-0.1709**	0.0409	-0.1765**
-0.04849	-0.02978	-0.05465
1980-84 Arrivals	-0.2267***	-0.0012	-0.2300**
-0.04886	-0.02875	-0.05546
1985-89 Arrivals	-0.2550***	-0.0204	-0.2558***
-0.04928	-0.02795	-0.05588
1990-94 Arrivals	-0.2624***	-0.0166	-0.2637***
-0.05199	-0.02972	-0.05847
1995-99 Arrivals	-0.2715***	-0.0198	-0.2724***
-0.05454	-0.03169	-0.06078
2000-04 Arrivals	-0.3250***	-0.0572	-0.3229***
-0.05697	-0.03241	-0.06304
2005-11 Arrivals	-0.2964***	-0.0316	-0.2977***
-0.06085	-0.03733	-0.06738
2012-15 Arrivals	-0.2860***	-0.0317	-0.2981**
-0.06629	-0.04493	-0.07383
Recession-Skill Interaction Term		-0.1857***	-0.2057***
	-0.02483	-0.04242
Recession-Skill-Immigrant Interaction term			0.1890***

Constant	5.6786***	5.6475***	5.6482***
-0.01453	-0.01121	-0.01156

Thank you!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#15

20 Dec 2016, 09:39

Your model is somewhat complicated because you actually have a three way interaction among period, immigrant, and skill.

I'm not sure what the output you're showing is. It is not ordinary Stata regression output. It looks like perhaps it is some partial output from three different models that has been put together using -estout- or -outreg- or something like that. In any case, since I don't know what these numbers are, I'm not going to refer to them or interpret them specifically. I will just gives you some general advice.

The focus of your attention should be on the three way and two way interaction terms. Those are the key results in an analysis like this. The "main effect" terms that you are asking about are of secondary importance, and their meanings are somewhat obscure. Here they are:

Immigrant indicator: the expected difference in log wages between immigrants and non-immigrants during the period that was coded 0 (which I imagine was the pre-recession period, though nothing would prevent you from doing it the other way around) and only in the skill category that was coded 0. It says nothing about anything else.

Recession indicator: the expected difference in log wages between the pre- and post-recession periods among those people coded 0 on the immigrant indicator (again, I would guess that is the non-immigrants, but you could have done it the other way) and only in the skill category that was coded 0.

Skill indicator: the expected difference in log wages between those observations coded 1 on the skill variable and those coded 0 on the skill variable among those people coded 0 on the immigrant indicator and only in the period coded 0 on the recession variable.

Three-way interaction models are pretty difficult to understand from the regression coefficients. I recommend that you go back and rerun your model(s) using factor variable notation (-help fvvarlist-) so that you can then run the -margins- command to easily get estimates of the expected log wages in each combination of the skill, recession, and immigrant variables. When doing that, don't forget to also code the polynomial terms in age using factor-variable notation as well--if you don't do that the -margins- output will be wrong. So something like this:

Code:

regress /* or -xtreg- or whatever */ log_wages i.skill##i.immigrant##i.recession c.age##c.age##c.age i.cohort state_unemployment margins skill#immigrant#recession

These results will be easier to understand than what you have generated.

While the -margins- chapter of the online manuals is quite comprehensive and replete with good examples, it is a bit of a heavy lift. A gentler introduction to -margins-, which will well prepare you to read the manual chapter, is https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.
1 like
Comment

Announcement