Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Binary Dependent variable in difference in difference method

    Hi everyone, I am running the following command for difference and difference method which I guess is used for linear difference and difference method but my dependent variable is binary(0|1) in nature so this would be wrong to run the analysis. Are below commands correct for the estimation of non linear difference and difference?

    reg formal i.mgnregadmy##i. time RO5 ca2 ca3 scholar NPERSONS COPC POOR
    margins mgnregadmy#time
    margins mgnregadmy, dydx (time)

    Now I have question is there any command for non linear difference and difference and yes then how can we interpret this non linear difference and difference results? Can this be interpreted same as interpretation of linear difference and difference results? Please help me soon as I am in the middle of project and badly struck and my time variable is discrete having value 0 and 1 and I would like to know the average effect of the program. mgnregadmy is also binary in nature having 0 and 1 values.
    Attached Files

  • #2
    but my dependent variable is binary(0|1) in nature so this would be wrong to run the analysis
    No, it is not necessarily wrong to run this analysis. The use of linear regression with a dichotomous outcome presents two possible problems:

    1. The model may predict outcomes that are outside the 0-1 range, and,
    2. Heteroscedasticity is almost guaranteed, which may invalidate the standard errors, confidence intervals, and p-values.

    However, the individual predicted values from the model may or may not be relevant to your research goals, and if they aren't, as long as the predictive margins are in the 0-1 interval, the use of linear regression provides a simple direct estimation of probability differences. This is often very useful. As for heteroscedasticity, this is only a problem if the predicted probabilities differ considerably from each other, the variance being a function of the probabilities themselves. But this is easy enough to overcome by using the -vce(robust)- option in the regression command.

    That said, you can also model dichotomous outcomes using the -logit- or -probit- regression models. Bear in mind that in -logit- you are estimating group differences in log odds (or, equivalently after exponentiation, odds ratios), not differences in probabilities. The interpretation of probit regression coefficients is not simple to explain. But following either model, you can estimate marginal effects, which give you estimated differences in probability. It is important to remember, however, that with the logit or probit models, because they are non-linear, the marginal effect becomes a function of the base probability rate itself. Consequently average marginal effects may fail to give an adequate picture of what is going on if the range of probabilities is wide.

    Added: In the future, please do not use screenshots to show Stata outcome. The one you posted is just barely legible on my computer; frequently screenshots come out completely unreadable. The helpful way to show Stata commands and output is to bind them between code delimiters. Please read FAQ #12 for instructions on the use of code delimiters.

    Comment


    • #3
      Thanks for your prompt reply. I have run the logistic regression in the context of difference in difference method, where my dependent variable is dichotomous having value 1 for formal loans and 0 for informal loans and the commands have been given in following png files and I hope this time this is png file not screenshot as I tried my best to make it visible and follow FAQ #12 as recommended by you. Most importantly my data set is two year panel data. Now I want your help in interpreting the results and check whether the command is right for panel data set.

      One more thing I want to know that for linear difference in difference command whether xt should be used before command for panel data because in the handbook on impact evolution whose link is given here in which I have seen the linear regression command without use of xt in page number 189 and 190 in chapter 14. So please tell me correct and right command for both.
      https://openknowledge.worldbank.org/...0Use0Only1.pdf
      Attached Files

      Comment


      • #4
        With panel data you must, at least initially use the -xt- commands. So you need to -xtset- your data and then run either -xtreg- for a linear difference model or -xtlogit- for a logistic model. You will have to decide whether you want to use fixed or random effects with these. If you are not familiar with these models, before proceeding I suggest you consult a good econometrics textbook so you understand the ideal conditions for using each technique, their pros and cons, and various approaches to choosing between them. -margins- works the same way after the -xt- commands as it does here.

        Comment


        • #5
          If I follow, you could use the approach shown in this example.

          Code:
          clear
          use http://www.stata-press.com/data/r15/lbw.dta
          generate byte white = race==1
          fre low smoke white
          * Jann, B. (2007). fre: Stata module to display one-way frequency table.
          * Available from http://ideas.repec.org/c/boc/bocode/s456835.html.
          
          * Estimate logit model
          logit low i.smoke##i.white age lwt
          * Get smoker vs non-smoker contrasts at each level of white;
          * use predict(xb) option to get them on the log-odds scale.
          margins r.smoke@white, vsquish predict(xb) contrast(nowald effects)
          *return list
          matrix table1 = r(table)' // This is the margins table shown above
          *matrix list table1
          matrix table2 = /// extract columns with B and lower & upper bounds of CI
          table1[1..rowsof(table1), 1], ///
          table1[1..rowsof(table1), 5..6]
          *matrix list table2 // This shows B with 95% CI
          * Use mata to exponentiate table2 to get ORs with CIs
          mata : st_matrix("ORtable", exp(st_matrix("table2")))
          matrix colnames ORtable = "OR" "Lower" "Upper"
          local rnames : rowfullnames table2 // rnames = row names from table2 matrix
          matrix rownames ORtable = `rnames' // Assign table2 row names to ORtable
          matrix list ORtable
          Apart from the logit command, I think the only change you would have to make is to the margins command, as follows:

          Code:
          margins r.mgnregadmy@time, vsquish predict(xb) contrast(nowald effects)
          While figuring out how the code works, it might help to uncomment some of the commands I've commented out. Once you have it working, and understand what it's doing, you may want to remove those lines entirely.

          HTH.

          PS - Crossed with Clyde's post in #4. If you use -xtlogit-, my code may need to be edited--I've not tried it with -xtlogit-.
          Last edited by Bruce Weaver; 29 Aug 2017, 13:53. Reason: Added postscript.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Thank you so much Dr Clyde and Mr. Bruce.

            Comment


            • #7
              Re the code in #5, it later occurred to me that the same ORs for smoking within each group can be obtained fairly easily via lincom commands.

              Code:
              . matrix list ORtable
              
              ORtable[2,3]
                              OR      Lower      Upper
              1.smoke#
              0.white  2.0388036  .75342553  5.5170946
              1.smoke#
              1.white  4.8554781  1.4693517  16.044945
              
              .
              . * The same ORs can be obtained via -lincom- commands.
              . quietly logit low i.smoke##i.white age lwt
              
              . * logit, coeflegend // Uncomment to see coefficient legend
              . * Get OR for smoker:nonsmoker in non-white group
              . lincom _b[1.smoke], or
              
               ( 1)  [low]1.smoke = 0
              
              ------------------------------------------------------------------------------
                       low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                       (1) |   2.038804   1.035532     1.40   0.161     .7534255    5.517095
              ------------------------------------------------------------------------------
              
              . * Get OR for smoker:nonsmoker in white group
              . lincom _b[1.smoke]+_b[1.smoke#1.white], or
              
               ( 1)  [low]1.smoke + [low]1.smoke#1.white = 0
              
              ------------------------------------------------------------------------------
                       low | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
                       (1) |   4.855478   2.961119     2.59   0.010     1.469352    16.04495
              ------------------------------------------------------------------------------
              HTH.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 18.5 (Windows)

              Comment


              • #8
                Thank you Mr. Bruce for your prompt and helpful replies. As I mentioned above that I am using panel data. Here i am going to attach my results with the command I used. just guide me to know whether I am right or wrong. here is the information about my work.
                I am running a following logit command for difference and difference method. My dependent variable (Loan20) is binary (0|1). I am using two year panel data (2004-05=0, 2011-12=1). so my time variable is discrete having values 0 and 1. mgnregadmy is also binary having values 0 and 1.

                xtlogit Loan20 i.mgnregadmy##i. time RO5 ca2 ca3 education1 NPERSONS COPC, or
                margins mgnregadmy#time
                margins mgnregadmy, dydx (time)

                I am having trouble while interpreting the results. and also I would like to know the average effect of the program. I am attaching png file of my results, kindly help to interpreting.
                Attached Files

                Comment


                • #9
                  Hi Neeraj,

                  I think you are working on IHDS dataset and trying to see the impact of MNREGA. I am also working on IHDS. Did you finally manage to get desired results?

                  Comment

                  Working...
                  X