Difference-in-difference-in-difference estimation in Stata

juliana pinto

Join Date: Oct 2017

Posts: 74
#16

25 Sep 2018, 11:40

Hello. I would like to know how i can plot the common trend graph for DID?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#17

25 Sep 2018, 11:59

This question is tangential to the topic of the thread. It is also a question that has been answered frequently in other Statalist threads. I suggest you first search the Forum for other threads on this. If, after reading those, you are not sure how to proceed, then I suggest you start a new thread. Also, if you do post back, be sure to include an example of your data (generated with the -dataex- command). And be sure to explain what the variables in your data example are and what role they play in your DID analysis.

If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Tanjina Rahman

Join Date: Dec 2018

Posts: 5
#18

17 Dec 2018, 02:35

Dear concern,
What kind of test will be appropriate for DID of binary dependent variable? Is it t-test or z-test?

With regards,
Tanjina
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#19

17 Dec 2018, 08:26

It depends on the analytic model you use. The commonest approach with dichotomous outcome variables is to use -logit- or -probit- and both of those give you z-statistics because they are based on asymptotic theory. If you use a linear probability model with -regress- you will get a t-statistic, which comes from small-sample estimation. Unless you are working with a pretty small sample (one that is likely too small to support a valid DID analysis in any case) this is a difference with no importance. The z-statistic is just the limit of the t-statistic as the sample size becomes large. Once you have more than about 30 degrees of freedom, the t- and z- distributions are barely distinguishable.
Comment
Tanjina Rahman

Join Date: Dec 2018

Posts: 5
#20

17 Dec 2018, 20:28

Dear Schechter,
Thanks for your valuable information.
Comment
Tanjina Rahman

Join Date: Dec 2018

Posts: 5
#21

18 Dec 2018, 02:27

In Stata when we deal with binary dependent variable i.e we estimate proportion of that variable not mean. But the "diff" command only gives us a t-test statistics p value not display proportional z-test statistics p-value. I have a large sample size and binary dependent variable i.e delivered by skilled provider-1 for 'yes' & 0 for 'no'. I want to calculate difference in difference estimate for baseline to endline control and intervention data. Is there any way to calculate z statistic p value solution?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#22

18 Dec 2018, 08:40

The -diff- command is a user-written command and it is designed to deal with continuous outcome variables. It has the advantage of providing a simplified output that clearly explains what is what, but it does not handle non-linear modeling. So it gives you a linear probability model, rather than a logistic model. That may be fine for your purposes: linear probability models are only problematic when you have to deal with probabilities that are close to zero or one.

Anyway, if you want to do this as a logistic model, you can't use -diff-. You have to build your own model. I'll assume that your data are suitable for the classic DID design: you have a 0/1 variable that distinguishes observations the precede (0) the intervention/policy change/whatever from those that follow(1). Let's call that one post. You have another variable, let's call it treatment, coded 0 for those that never receive the intervention/undergo the policy change/whatever, and 1 for those that do. (The 0 and 1 codes for treat are the same for all observations of a given entity, regardless of whether they precede or follow the actual implementation.) Then the basic analysis is:

Code:

xtlogit outcome i.treatment##i.post, fe

You may want to embellish the command with the inclusion of some covariates, and perhaps cluster robust variance estimation. The coefficient of 1.treatment#1.post provides the DID estimator of the intervention effect on the log-odds scale, and is accompanied by a z-test.

If you need additional assistance with implementing this, when posting back be sure to use the -dataex- command to show example data, and also show all of the code you tried and all of the output you got from that. Read forum FAQ #12 for advice on the most helpful ways to do that.
1 like
Comment
Tanjina Rahman

Join Date: Dec 2018

Posts: 5
#23

05 Jan 2019, 20:08

Dear all,
The DID estimate for z-statistic, estimate odds ratio but then how we can interpret this odds ratio for outcome variable.

Thanks,
Tanjina
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#24

05 Jan 2019, 22:12

So, for the interaction term, the "odds ratio" is actually a ratio of odds ratios. The DID estimate of treatment effect is given by this ratio of odds ratios. In the odds metric, effects are multiplicative rather than additive. So DID becomes ROR (ratio of ratios).

If you want more specific guidance, I think you need to show your results.
Comment

Tanjina Rahman

Join Date: Dec 2018
Posts: 5

#25

08 Jan 2019, 02:33

Dear Schechter,
Thanks for your co-operation. The outcome variable is Antenatal care received by skilled or unskilled provider. Study variable indicates 1 for endline & 0 for baseline. The area variable indicates 1 for intervention & 0 for control. How can I interpret this results for "Skilled Antenatal care" DID of interaction area*study. Could you please instruct me.

ANC_Y_N

Odds Ratio

Std. Err.

P>z

[95% Conf.

Interval]

_Istudy_1

1.893166

0.156249

7.73

1.610409

2.225569

area

0.617872

0.058956

-5.05

0.512483

0.744934

_IstuXarea_1

1.876006

0.214535

5.5

1.499316

2.347336

_cons

2.421603

0.169913

12.6

2.110465

2.778611

Regards,
Tanjina

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#26

08 Jan 2019, 14:48

The odds ratio (endline:baseline) of receiving antenatal care in the intervention area is 1.88 (to 2 decimal places, 95% CI 1.50-2.35) times as large in the intervention area as it is in the control area. That is, whatever it is that happened between the baseline and the end of study time periods, is associated with an 88% greater increase in the odds of receiving antenatal care in the intervention area than was seen in the control area.
Comment
Yonathan Adm

Join Date: Mar 2019

Posts: 3
#27

10 May 2019, 11:53

Difference in difference using survey data

Dear Clyde,

I’m using difference in difference to estimate the impact of policy reform on child labor using survey data. My unit of analysis is a child who is between the age of 10 and 17. The problem is that I do not observe the same child over the two periods, before and after, since the data comes from National Labor Force Survey.

I have been thinking of using years of birth of a child to have the time dimension in my analysis and get children born in a given year included in both before and after the reform period. The problem in this approach is that would end up having only two years of birth which are common for the two periods. So, I tried to use the average of the all variables by child age to overcome the problem and change the unit of analysis to child age and year of birth separately as indicated below. But there is a significant loss of observation. I’m not also sure whether I’m doing it right.

gen yearXtreated=year*treated
collapse id07 id08 weight childwhour zemporment fathereduc nchildren ///
fatherage yearXtreated, by(childage)

collapse id07 id08 weight childage childwhour zemporment fathereduc nchildren ///
fatherage yearXtreated, by(childage)

svy: reg childwhour yearXtreated zemporment nchildren fathereduc childfemale childage i.region

So, can I do the individual level analysis on the assumption that the household level variation will not be cancelled out entirely as a Representative National Labor Force Survey?
What should I do to overcome this problem?

Thank you very much for your help!

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte region int id07 byte id08 double weight float(childage ybirth childwhour nchildren fathereduc childfemale zemporment yearXtreated) 1 5 3 223.69 12 1987 . 4 . 1 -.10025822 1999 1 3 24 183.4 14 1985 1 6 . 1 -1.54133 1999 1 3 30 183.4 15 1994 . 2 . 1 -.14047733 1999 1 3 33 183.4 17 1982 . 7 . 1 -.470427 1999 1 3 4 31.08 12 1987 2 5 . 1 -.10025822 1999 1 3 4 31.08 16 1983 . 5 . 1 -.14047733 1999 1 3 9 31.08 17 1982 1 2 6 1 -.8965764 1999 1 3 35 31.08 11 1988 2 4 . 1 .4156623 1999 1 4 3 30.65 17 1982 2 2 . 1 -.8965764 1999 1 4 8 30.65 12 1987 . 5 . 1 .521284 1999 end label values region lf01 label def lf01 1 "tigray", modify
Comment
Stephen Butler

Join Date: Aug 2019

Posts: 24
#28

03 Aug 2019, 10:31

Hello, I'm also doing a differences-in-differences-in-differences analysis (DDD) using Stata 15.

I'm measuring the impact of removing financial incentives on quality indicator performance, in ~450 healthcare clinics from 2016 to 2017. Financial incentives remained for two quality indicators during that period (my two controls).

I'd like to estimate the precise effect of removing the incentives on quality achievement, whilst adjusting for time-invariant factors pertaining to individual practices and quality indicators.

So far, I've done a basic DiD (with dummy variables for the pre- and post- incentive removal, and for treatment control).

My Stata command is:

HTML Code:

reg Achievement Time Treated Time##Treated i.indicator i.practice

My output is:

HTML Code:

Achievement | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- Time | -1.063959 .5073119 -2.10 0.036 -2.058322 -.0695965 Treated | 20.38859 .5700967 35.76 0.000 19.27117 21.50601 1.Time | 0 (omitted) 1.Treated | 0 (omitted) | Time#Treated | 1 1 | -4.78109 .5263227 -9.08 0.000 -5.812714 -3.749465 | indicator | 2 | -10.22745 .5072466 -20.16 0.000 -11.22169 -9.233218 3 | -19.64932 .5071011 -38.75 0.000 -20.64327 -18.65537 4 | -8.730226 .5071011 -17.22 0.000 -9.724175 -7.736276 5 | -6.353807 .5075383 -12.52 0.000 -7.348613 -5.359001 6 | -9.002199 .5073981 -17.74 0.000 -9.99673 -8.007668 7 | -8.68239 .5071011 -17.12 0.000 -9.676339 -7.68844 8 | -1.745006 .5073923 -3.44 0.001 -2.739526 -.7504858 9 | -15.92869 .5071011 -31.41 0.000 -16.92264 -14.93474 10 | -18.55426 .5069612 -36.60 0.000 -19.54793 -17.56058 11 | -5.963069 .5071011 -11.76 0.000 -6.957018 -4.969119 12 | -21.0394 .5071011 -41.49 0.000 -22.03335 -20.04545 13 | -26.1869 .5071011 -51.64 0.000 -27.18085 -25.19295 14 | -8.664169 .5072464 -17.08 0.000 -9.658403 -7.669934 15 | -8.35708 .5073923 -16.47 0.000 -9.3516 -7.36256 16 | -7.64188 .5076896 -15.05 0.000 -8.636983 -6.646777 17 | 1.926681 .5071011 3.80 0.000 .9327313 2.92063 18 | 0 (omitted) 19 | -16.04299 .5072464 -31.63 0.000 -17.03723 -15.04876 20 | -13.34192 .5071011 -26.31 0.000 -14.33587 -12.34797 21 | -7.2508 .5071011 -14.30 0.000 -8.244749 -6.25685 22 | -1.456801 .5105569 -2.85 0.004 -2.457524 -.4560785

..... (and so on...)

The effect of removing financial incentives was a drop of -4.8 % points in performance.

I think this approach needs refining, but I'd appreciate advice:

1) I'm using (unbalanced) panel data, so does this need a different regression command?

2) Within the 'treated' group of indicators, there are two types of indicator (ones that measure health outcomes, and ones that measure clinical processes); so I'd like to do a DDD analysis to examine whether there was a significant difference in outcome between the two types. Please can you advise how to do this? (Is it a case of using '0', for the control group dummy, '1' for one type of indicator, and '2' for the second type)

3) One of the quality indicators gets omitted due to collinearity. Why would this be? Does it matter?

4) Is it OK that I'm combining two quality indicators as a control? (they are both '0' in the Treatment dummy variable)

5) I'd also like to assess whether there was any difference in relative variance between the treatment and control group. i.e. were quality scores less consistent between practices, once the financial incentive was removed (relative to the control group). How can I calculate this on Stata, with confidence intervals?

Many thanks for your help.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#29

03 Aug 2019, 10:53

1. Balanced or unbalanced makes no difference. But if you are working with panel data, you should be using a panel-data estimator. If practice is your panel variable indicator, then you are OK because including i.practice in the model makes this an emulation of the panel estimator -xtreg, fe-.

2. Create a new variable: indicator_type set to 1 for health outcomes and 2 for processes. Then interact that with Time##Treated.

3. Because all of your indicators are either in the treatment or control group, right? So there is colinearity between the Treat variable and the indicator indicators ("dummies") even when 1.indicator is dropped as the reference. So one more indicator dummy has to drop to resolve the colinearity.

4. You can have one indicator as a control, 2, or any number at all. What matters is that, a) they not be affected by the treatment, and b) they behave similarly to each other and to the treated indicators prior to the start of treatment (parallel trends assumption).

5. I don't understand the question. Perhaps because I don't understand the data. I'm inferring, perhaps wrongly, that you have only one observation per combination of practice, indicator, and time. In that case, there is nothing to calculate the variance of. I suppose I'm missing something here, so perhaps you can explain in greater depth.
2 likes
Comment
Stephen Butler

Join Date: Aug 2019

Posts: 24
#30

03 Aug 2019, 11:12

Many thanks Clyde, that's very helpful.

Regarding question 5: I have one observation per practice, for each indicator and for each time.

e.g

Practice Indicator Achievement Treated Time
W95009 AST004 85.714286 1 0
W95009 COPD002 95.876289 1 0
W95009 COPD003 95.918367 1 0
W95009 COPD005 100 1 0
W95009 DM002 87.665198 1 0
W95009 DM003 73.827534 1 0
W95009 DM007 72.542373 1 0
W95009 DM012 96 1 0
W95009 EP003W 94.736842 1 0
W95009 MH002 100 1 0
W95009 MH007 100 1 0
W95009 RA002 94.642857 1 0
W95010 AST004 86.904762 1 0
W95010 COPD002 81 1 0
W95010 COPD003 91.6 1 0
W95010 COPD005 94.047619 1 0

When you have financial incentives on a quality indicator, there tends to be a reduction in variance from the mean in performance between different clinics (they all cluster round the same score). So I'd like to test whether, if you remove the incentive, does variance from the mean increase again (relative to the control group, where incentives remain). And can you work out a 'confidence interval' for that difference in variance, and determine whether it's significant?

Thanks again.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment