Difference and Difference

Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#1

Difference and Difference

15 Aug 2017, 13:18

Hello All
I am using large data set and used difference and difference method for the first time. I used the command for this from the source of world bank's book 'Impact of Evolution'. I am having difficulty in interpretation of four dichotomous variables namely mgnregadmy, time and ca2 and ca3. Variable name intersection is intersection of time and mgnregadmy and its coefficient is did estimator. Kindly help me to interpret this. Is variable RO5 and NPERSONS interpretation will be same like simple OLS regression.
Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

15 Aug 2017, 13:32

Your screenshot is not readable on my computer. That is why the use of screenshots in this forum is discouraged. The best way to show Stata output is to copy directly from Stata's Results window or your log file and paste into the Forum editor between code delimiters. The use of code delimiters is explained in FAQ #12.

That said, without seeing your results, I have a question and a suggestion. Question: is time a continuous variable, or is it a pre-post variable. The latter is more common in difference-in-differences in models, but continuous time DID models are used as well. The suggestion is that you re-run the analysis using factor variable notation, and then use margins afterward. (The code shown here requires you to replace italicized parts by your actual variables and regression command; in addition the code assumes time is a discrete pre-post variable. If it isn't, replace i.time by c.time)

Code:

regression_command i.mgnregadmy##i.time other variables perhaps margins mgnregadmy#time margins mgnregadmy, dydx(time)

This output will make it easier to understand and interpret the results than just the regression table.

If you do that and post the output between code delimiters, I will be happy to review the interpretation of the results with you, if somebody else doesn't get to it first.
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#3

15 Aug 2017, 19:23

Thanks for such quick response. I am using two year panel data,2004-05 and 2011-12. I am attaching png file now. I hope it will readable.
Thanks
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#4

15 Aug 2017, 20:38

Yes, the png file came out better. I still want to emphasize that the use of screen shots on this forum is strongly discouraged and they really should only be used to show things like graphs, or other content that has little or no meaningful text.

You still didn't answer my question about whether time is discrete or continuous in your model. I understand that you have two year panel data, but you could still model that as either discrete or continuous. I will assume your time variable is a simple pre-post dichotomy coded 0/1.

So, the interpretation of the interaction term and the time term is that in the group that did not receive the mgnregadmy intervention, the expected value of sharemeatandegg increased by 0.0059... between the pre- and post- intervention observations. In the group that did receive the intervention, the expected value of sharemeatandegg increased by 0.002... more than that. One could also say that the increase in the group that did receive the intervention was 0.0059... + 0.002., or approximately 0.0079.... The intervention effect is 0.002..., with a 95% CI of 0.00016... to 0.0038...

It is of some interest to note that relative to the pre-intervention-period expected value of sharemeatandaegg in the non-intervention group, 0.059, this effect is over an order of magnitude smaller. In fact, it seems that the intervention effect is about 5% of the original value.

If you were to heed my advice in #2 and re-do this analysis with factor-variable notation and -margins- a more detailed interpretation could be provided.
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#5

16 Aug 2017, 01:48

Thanks for your prompt reply. Yes you are right that time variable is a simple pre-post dichotomy coded 0/1. Now I am attaching the results that I got following the commands recommended by you in #2. Now I have some doubts regarding the results like what is mgnregadmy#time representing? Is it interaction of mgnregadmy and time? As i already have interaction variable (mgnregadmy*time) called interaction only so can't we use interaction variable instead of mgnrega#time?
Attached Files
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#6

16 Aug 2017, 05:17

Sorry I got my mistake of running with two same variables with different names that is interaction of mgnregadmy and time and variable interaction and now with correct command and no doubling of variables, hereby I am attaching the files of results and he;lp[ me now with interpretation part of it. Thank you .
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#7

16 Aug 2017, 10:42

So the first -margins- table tells you that in the control (mgnregadmy = 0) group during the pre-period (time = 0, the expected value of sharefruitandsugar, adjusted for everything else in your model was 0.096..(95% CI 0.095-0.097). In that same group, during the post-period, it was 0.104 (95% CI 0.103-0.105). The difference between those is shown in the first line of the second margins table: 0.007... (95% CI 0.006-0.0089...) So this last figure shows you the secular trend in sharefruitandsugar, the change that occurred independently of the actual mgnregadmy intervention.

Returning to the first -margins- output table, the last two lines show the expected value of sharefruitandsugar in the mgnregadmy = 1 group during the pre and post periods. In the pre period it was 0.09 (95% CI 0.089... to 0.091...). During the post period it was 0.102... (95% CI 0.101...-0.103...). The difference is, again, found in the second margins output table in the final line: 0.012 (95% CI 0.010...-0.013...) This difference represents the combination of the secular trend that also operated in the control group with the effects of the intervention and anything directly associated with the intervention.

The pure intervention effect, therefore, is the difference between those two differences. That is found in the mgnregadmy#time row of the regression output table. It is 0.004...(95% CI 0.002... to 0.006...).

Again, all of these expected values and differences, and difference in differences, are adjusted for the other model variables.

I would note that the pure intervention effect (difference in differences) is about 5% of the starting value in the control group.
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#8

17 Aug 2017, 11:36

Hello, Thanks for your suggestion. I have one more question that Is the interpretation of RO5 and ca2 and ca3 will be same like of simple OLS regression and #7 " I would note that the pure intervention effect (difference in differences) is about 5% of the starting value in the control group" in this pure intervention effect would be 5 percent or 0.5 percent?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#9

17 Aug 2017, 12:53

Yes, the interpretation of RO55, ca2, and ca3 are the same as in a simple regression with no interaction terms.

The "pure intervention effect" is .0045832. The baseline values are .0963203 and .1040018 in the two groups, so as a percentage of baseline, the intervention effect is 100*.0045832/.0963203 or 100*.0045832/.1040018, which are 4.76% and 4.61%, respectively.
1 like
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#10

20 Aug 2017, 10:55

I am very grateful for your suggestions and one question comes in my mind that is there robustness test for these results and is it necessary to go for robustness test?
Comment
Marisa Foraci

Join Date: Mar 2017

Posts: 10
#11

31 Aug 2017, 04:25

Dear all I am trying to understand what marginal effect contribution command I should use. So given the below probit model

probit becamenonpoor09 r_debt r_children01 r_notoilet r_rural r_highesteduc r_noprimary r_plain, robust

In order to get meaningful marginal effects should I use a MEM (Marginal Effects at the Means) or an AME (Average Marginal Effects) method? I personally feel that the second option would be most appropriate since I really need to know only whether these determinants contribute to the probability of people exiting poverty and not whether this happened at specific means of some particulat variables. So I would rather use this command:

margins, dydx (r_debt r_children01 r_notoilet r_rural r_highesteduc r_noprimary r_plain)

rather than

mchange, amount (binary one) stat (all) at (r_debt= 478587.3 r_children01=1 r_notoilet=1 r_rural=1 r_highesteduc=1 r_noprimary=1 r_plain=1)

Does anyone has a different take on this matter?Am I making any sense?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#12

31 Aug 2017, 08:32

In order to get meaningful marginal effects should I use a MEM (Marginal Effects at the Means) or an AME (Average Marginal Effects) method? I personally feel that the second option would be most appropriate since I really need to know only whether these determinants contribute to the probability of people exiting poverty and not whether this happened at specific means of some particulat variables.

If that is your goal, you do not need marginal effects at all. Just deal with the probit output directly to see if your these determinants are associated with differences in probability of your outcome.

Marginal effects give you an estimate of how large these effects are, measured in probability of outcome. But if all you are interested in is hypothesis tests of whether the effects are non-zero, you get that from the regression output itself.
Comment
Marisa Foraci

Join Date: Mar 2017

Posts: 10
#13

01 Sep 2017, 01:03

Thank you Clyde for your suggestion, I would totally agree with you but I have been asked for the marginal effect of each single covariate and I am wondering what command would better give the answer my client is seeking.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#14

01 Sep 2017, 08:32

Part of me is tempted to tell you to go back to ask your client for clarification of what he/she wants. But in the meantime, let's just go over the difference between AME and MEM and then perhaps you can read your client's mind better than I can, since neither kind of marginal effect is the best way to determine "whether the determinants contribute to" the outcome.

The MEM is simpler. Stata sets all the model variables to their estimation-sample mean values and then calculates model-predicted probabilities for each observation and averages them. The next step differs slightly for categorical and continuous variables. Let's discuss how it's done for categorical variables: continuous variables are more complicated but conceptually not all that different. The variable whose marginal effect is being estimated is then increased by 1 and the predicted probabilities are re-calculated (still with all other variables set to their means) and they are averaged. The difference between this average and the previous one is the MEM. It can be thought of as the marginal effect of the variable in a hypothetical person who is exactly average in all his/her attributes. Evidently one of the limitations of the MEM is that usually there is no such person, so it is a hypothetical marginal effect. Also if you calculate marginal effects at various levels of a single variable, but at means for all other variables, the variation in those marginal effects is somewhat restricted because variation in other variables has been eliminated. The MEMs are not adjusted for population variation on other variables: they characterize only specific values of those variables (namely their means).

The AME is a bit more complicated. Again let's talk about the average marginal effect of a categorical variable. Stata first sets the focused variable to its baseline value in all observations and calculates predicted probabilities leaving all other variables at their observed values. The results are averaged. The focused variable is then increased by 1, and the predicted probabilities, again leaving all other variables at their observed values, are re-calculated. The difference is the AME. This, too, does not correspond to any actual person, but it does reflect the effects on the population as a whole. That is, given a population whose characteristics match those of the sample in all other respects, a difference of 1 unit in the focused variable will actually be associated with a change in overall average population outcome equal to the AME. If you vary the base level of the focused variable and calculate AME's at each of those levels, the variation will be greater than that of the MEM because the variation in other attributes is included in the calculation. The AME's are adjusted for variation on other variables.

So the difference is AME: good reflection of the total marginal effect in the population, MEM: good reflection of the marginal effect on a hypothetical person (but a hypothetical person who is, in a sense, typical of the population).

So you need to figure out, or ask your client, which of these is more suitable for present purposes. In most situations, the AME is more useful, but not always.

By the way, if you do go for the MEM, you do not need to (and should not, because you might get it wrong) calculate the means yourself. Instead of specifying a long -at()- option with each mean filled in, just write -atmeans- and Stata will do it for you.
Comment

Announcement

Difference and Difference

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment