Binary Dependent variable in difference in difference method

Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#1

Binary Dependent variable in difference in difference method

28 Aug 2017, 11:34

Hi everyone, I am running the following command for difference and difference method which I guess is used for linear difference and difference method but my dependent variable is binary(0|1) in nature so this would be wrong to run the analysis. Are below commands correct for the estimation of non linear difference and difference?

reg formal i.mgnregadmy##i. time RO5 ca2 ca3 scholar NPERSONS COPC POOR
margins mgnregadmy#time
margins mgnregadmy, dydx (time)

Now I have question is there any command for non linear difference and difference and yes then how can we interpret this non linear difference and difference results? Can this be interpreted same as interpretation of linear difference and difference results? Please help me soon as I am in the middle of project and badly struck and my time variable is discrete having value 0 and 1 and I would like to know the average effect of the program. mgnregadmy is also binary in nature having 0 and 1 values.
Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

28 Aug 2017, 13:54

but my dependent variable is binary(0|1) in nature so this would be wrong to run the analysis

No, it is not necessarily wrong to run this analysis. The use of linear regression with a dichotomous outcome presents two possible problems:

1. The model may predict outcomes that are outside the 0-1 range, and,
2. Heteroscedasticity is almost guaranteed, which may invalidate the standard errors, confidence intervals, and p-values.

However, the individual predicted values from the model may or may not be relevant to your research goals, and if they aren't, as long as the predictive margins are in the 0-1 interval, the use of linear regression provides a simple direct estimation of probability differences. This is often very useful. As for heteroscedasticity, this is only a problem if the predicted probabilities differ considerably from each other, the variance being a function of the probabilities themselves. But this is easy enough to overcome by using the -vce(robust)- option in the regression command.

That said, you can also model dichotomous outcomes using the -logit- or -probit- regression models. Bear in mind that in -logit- you are estimating group differences in log odds (or, equivalently after exponentiation, odds ratios), not differences in probabilities. The interpretation of probit regression coefficients is not simple to explain. But following either model, you can estimate marginal effects, which give you estimated differences in probability. It is important to remember, however, that with the logit or probit models, because they are non-linear, the marginal effect becomes a function of the base probability rate itself. Consequently average marginal effects may fail to give an adequate picture of what is going on if the range of probabilities is wide.

Added: In the future, please do not use screenshots to show Stata outcome. The one you posted is just barely legible on my computer; frequently screenshots come out completely unreadable. The helpful way to show Stata commands and output is to bind them between code delimiters. Please read FAQ #12 for instructions on the use of code delimiters.
2 likes
Comment
Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#3

29 Aug 2017, 13:24

Thanks for your prompt reply. I have run the logistic regression in the context of difference in difference method, where my dependent variable is dichotomous having value 1 for formal loans and 0 for informal loans and the commands have been given in following png files and I hope this time this is png file not screenshot as I tried my best to make it visible and follow FAQ #12 as recommended by you. Most importantly my data set is two year panel data. Now I want your help in interpreting the results and check whether the command is right for panel data set.

One more thing I want to know that for linear difference in difference command whether xt should be used before command for panel data because in the handbook on impact evolution whose link is given here in which I have seen the linear regression command without use of xt in page number 189 and 190 in chapter 14. So please tell me correct and right command for both.
https://openknowledge.worldbank.org/...0Use0Only1.pdf
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#4

29 Aug 2017, 13:48

With panel data you must, at least initially use the -xt- commands. So you need to -xtset- your data and then run either -xtreg- for a linear difference model or -xtlogit- for a logistic model. You will have to decide whether you want to use fixed or random effects with these. If you are not familiar with these models, before proceeding I suggest you consult a good econometrics textbook so you understand the ideal conditions for using each technique, their pros and cons, and various approaches to choosing between them. -margins- works the same way after the -xt- commands as it does here.
1 like
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1133

29 Aug 2017, 13:50

If I follow, you could use the approach shown in this example.

Code:

clear
use http://www.stata-press.com/data/r15/lbw.dta
generate byte white = race==1
fre low smoke white
* Jann, B. (2007). fre: Stata module to display one-way frequency table.
* Available from http://ideas.repec.org/c/boc/bocode/s456835.html.

* Estimate logit model
logit low i.smoke##i.white age lwt
* Get smoker vs non-smoker contrasts at each level of white;
* use predict(xb) option to get them on the log-odds scale.
margins r.smoke@white, vsquish predict(xb) contrast(nowald effects)
*return list
matrix table1 = r(table)' // This is the margins table shown above
*matrix list table1
matrix table2 = /// extract columns with B and lower & upper bounds of CI
table1[1..rowsof(table1), 1], ///
table1[1..rowsof(table1), 5..6]
*matrix list table2 // This shows B with 95% CI
* Use mata to exponentiate table2 to get ORs with CIs
mata : st_matrix("ORtable", exp(st_matrix("table2")))
matrix colnames ORtable = "OR" "Lower" "Upper"
local rnames : rowfullnames table2 // rnames = row names from table2 matrix
matrix rownames ORtable = `rnames' // Assign table2 row names to ORtable
matrix list ORtable

Apart from the logit command, I think the only change you would have to make is to the margins command, as follows:

Code:

margins r.mgnregadmy@time, vsquish predict(xb) contrast(nowald effects)

While figuring out how the code works, it might help to uncomment some of the commands I've commented out. Once you have it working, and understand what it's doing, you may want to remove those lines entirely.

HTH.

PS - Crossed with Clyde's post in #4. If you use -xtlogit-, my code may need to be edited--I've not tried it with -xtlogit-.

Last edited by Bruce Weaver; 29 Aug 2017, 13:53. Reason: Added postscript.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#6

29 Aug 2017, 19:58

Thank you so much Dr Clyde and Mr. Bruce.
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1133

30 Aug 2017, 09:30

Re the code in #5, it later occurred to me that the same ORs for smoking within each group can be obtained fairly easily via lincom commands.

Code:

. matrix list ORtable

ORtable[2,3]
                OR      Lower      Upper
1.smoke#
0.white  2.0388036  .75342553  5.5170946
1.smoke#
1.white  4.8554781  1.4693517  16.044945

.
. * The same ORs can be obtained via -lincom- commands.
. quietly logit low i.smoke##i.white age lwt

. * logit, coeflegend // Uncomment to see coefficient legend
. * Get OR for smoker:nonsmoker in non-white group
. lincom _b[1.smoke], or

 ( 1)  [low]1.smoke = 0

------------------------------------------------------------------------------
         low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   2.038804   1.035532     1.40   0.161     .7534255    5.517095
------------------------------------------------------------------------------

. * Get OR for smoker:nonsmoker in white group
. lincom _b[1.smoke]+_b[1.smoke#1.white], or

 ( 1)  [low]1.smoke + [low]1.smoke#1.white = 0

------------------------------------------------------------------------------
         low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   4.855478   2.961119     2.59   0.010     1.469352    16.04495
------------------------------------------------------------------------------

HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Neeraj Kumar

Join Date: Jul 2017

Posts: 98
#8

03 Sep 2017, 11:07

Thank you Mr. Bruce for your prompt and helpful replies. As I mentioned above that I am using panel data. Here i am going to attach my results with the command I used. just guide me to know whether I am right or wrong. here is the information about my work.
I am running a following logit command for difference and difference method. My dependent variable (Loan20) is binary (0|1). I am using two year panel data (2004-05=0, 2011-12=1). so my time variable is discrete having values 0 and 1. mgnregadmy is also binary having values 0 and 1.

xtlogit Loan20 i.mgnregadmy##i. time RO5 ca2 ca3 education1 NPERSONS COPC, or
margins mgnregadmy#time
margins mgnregadmy, dydx (time)

I am having trouble while interpreting the results. and also I would like to know the average effect of the program. I am attaching png file of my results, kindly help to interpreting.
Attached Files
Comment
Seema Gupta

Join Date: Sep 2016

Posts: 2
#9

18 Apr 2018, 07:50

Hi Neeraj,

I think you are working on IHDS dataset and trying to see the impact of MNREGA. I am also working on IHDS. Did you finally manage to get desired results?
Comment

Announcement