Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical dependent variable in a difference-in-difference model

    Is it possible to run a difference-in-difference model if the dependent variable is categorical and the independent variables are either categorical/continuous/binary?

  • #2
    Yes. For a dichotomous outcome you just use a logistic or probit model; the use of the interaction between treatment group and time is the same. For a polychotomous outcome you can use -mlogit-.

    Comment


    • #3
      Similarly you should be able to use ologit for ordinal outcomes.

      Comment


      • #4
        I am trying to run DID using both Ordinal and Nominal variables in the estimation. How can I interpret the result? Specifically, if the estimation is based on multinomial logit and ordered logit, how should I interpret the coefficients?

        In the results shown, gb_dummy = treatment indicator, time_recode = time indicator, gb_dummy#time_recode= interaction variable and v743a = outcome variable in mlogit case and v457 = outcome variable in ologit case.

        Code:
        mlogit v743a i.gb_dummy##i.time_recode, base(4)
        Click image for larger version

Name:	did results.JPG
Views:	1
Size:	64.2 KB
ID:	1597647

        Code:
        ologit v457 i.gb_dummy##i.time_recode
        Click image for larger version

Name:	ologit.JPG
Views:	1
Size:	52.2 KB
ID:	1597648

        Comment


        • #5
          Specifically, if the estimation is based on multinomial logit and ordered logit, how should I interpret the coefficients?
          I wouldn't even try. It's very complicated and even experienced users get it wrong much of the time.

          Instead, run the -margins- command after each of your regressions to get the probabilities of each outcome under each combination of gb_dummy and time_recode. And then graph them so you can see what is going on.

          Code:
          forvalues i = 1/4 {
              margins gb_dummy#time_recode, predict(outcome(`i'))
              marginsplot, name(outcome`i', replace)
          }
          (It appears from your outputs that both outcome variables have four-levels. If I have that wrong, change the -forvalues- command accordingly.

          Comment


          • #6
            Thank you for your response. As I understand, in diff-in-diff the coefficient of interaction term denotes the pure effect of the treatment. I tried interpreting the interaction term in mlogit case as follows;

            In the group that received the treatment, the expected value of, 'respondent_alone' increased by .3628units,
            'respondent_and_husband_partner' increased by .3553 units and 'someone_else' decreased by .1981 units
            relative to the base outcome 'husband_partner_alone' while keeping the other variables constant.

            Is it the right way to interpret? Expecting your valuable feedback before I can proceed with the margins as you suggested.
            Last edited by Anand Sunny; 15 Mar 2021, 17:11.

            Comment


            • #7
              First, I don't see where those numbers you mention in #6 come from. Perhaps you can explain how you arrived at them.

              Second, it is almost impossible to say anything from the coefficients of an -mlogit- about the impact of a change in any variable. First there is the fact that the coefficients are logarithms of probablity ratios, not themselves probabilities. Then there is the fact that the effect of a probability ratio depends on the starting probability. Then there is the fact that the probabilities across the outcome categories must sum to 1, so that you can see situations where a coefficient is negative for an outcome but its probability increases (or vice versa) because the decrease in some other category was even larger! I really never try to do this myself, and I've seen people with lots of experience and expertise get it wrong when they do.

              The -margins- command makes it simple by showing you the actual predicted probabilities of each outcome at the specified values of your predictors. These are numbers that you can look at, understand, and interpret.

              Comment


              • #8
                Sorry, I did a mistake in the interpretation. The coefficients are in log odds terms. I reported those directly. I used the margins commands as you suggested and got the following results. How should I interpret the same?
                Click image for larger version

Name:	margins1.JPG
Views:	1
Size:	72.4 KB
ID:	1598314

                Click image for larger version

Name:	margins2.JPG
Views:	1
Size:	36.3 KB
ID:	1598315

                Click image for larger version

Name:	margins3.JPG
Views:	1
Size:	45.8 KB
ID:	1598316

                Comment


                • #9
                  These numbers interpret themselves. The outputs are showing you the predicted probability of being in each outcome, given the values of gb_dummy and time_recode. Do you have a more specific question?

                  Comment


                  • #10
                    Thank you for your response. If we take the case of the last margins table, is it correct to say that,

                    for the control group, the expected value of the outcome 'someone_else' was 5.67...% for the pre intervention period and the expected value of the outcome for the post intervention period was 2.1..%. i.e. a decrease of 5.67 - 2.1=3.57%

                    Similarly for the treatment group the expected value of the outcome 'someone_else' for the pre intervention period was 6.78..% and for the post intervention period was 1.63..%. i.e. a decrease of 6.78 - 1.63 = 5.15%. Does estimating the average marginal effects means the same? How can i modify the code to estimate the average marginal effects?

                    Comment


                    • #11
                      Your interpretations in #10 are correct.

                      You state that you generated those results using the commands I suggested in #5. If that is true, to get average marginal effects of time (the differences you calculated in #10) you can do

                      Code:
                      forvalues i = 1/4 {
                          margins gb_dummy, predict(outcome(`i')) dydx(time_recode)
                      }
                      Added: And if you would like the difference in differences in the probability metric:

                      Code:
                      forvalues i = 1/4 {
                          margins gb_dummy, predict(outcome(`i')) dydx(time_recode) pwcompare(effects)
                      }
                      Last edited by Clyde Schechter; 21 Mar 2021, 17:16.

                      Comment


                      • #12
                        Click image for larger version

Name:	v743a avg marg.JPG
Views:	1
Size:	72.6 KB
ID:	1599126

                        Thank you for your response. I used the codes suggested in #11 and got the above results. In the probability metric, the average difference in difference in the probability of the chosen outcome is -0.015894 or 1.5894%. This is the difference in differences estimate of the effect of the treatment or the true intervention effect of the treatment Is this interpretation correct?

                        Comment


                        • #13
                          In the probability metric, the average difference in difference in the probability of the chosen outcome is -0.015894 or 1.5894%. This is the difference in differences estimate of the effect of the treatment or the true intervention effect of the treatment Is this interpretation correct?
                          Almost right. The DID in the probability is, indeed, -0.015894 (which I would round to -0.016 or eve -0.02). But that is not 1.5894%; it is 1.5894 percentage points. Absolute differences in figures that are percents are denominated in percentage points, not percents. When you say something changes by x%, that language refers to a multiplicative change, which is not what you have here.

                          Comment


                          • #14
                            Hi,

                            My paper uses a quintile Difference-in-Differences estimator with a categorical outcome variable. My specification is as follows:

                            oprobit cs_1_num post_q1 post_q2 post_q3 post_q4

                            Here cs_1_num takes three values: 1 = Better, 2 = Same and 3 = Wose. `post_q1' is the dummy which takes the value 1 for observations in the lowest quintile in the post period, and the other regressors are defined similarly.

                            What version of the margins command do I use to obtain the difference in difference in the probability of each outcome for each quintile? Would margins, dydx(post_q`i') subpop(post_q`i') post do the job?

                            Additionally, how do I generate an event study graph which plots the difference in difference estimate for each outcome and quintile combination across survey waves (defined by the variable wave_no).
                            Last edited by Sagara Ann; 11 Aug 2024, 12:15.

                            Comment


                            • #15
                              You can't use the -margins- command because you did not use factor-variable notation in the regression itself. The post_q* variables are not useful. What you need instead is a single variable, let's call it post_quintile, that takes on the values 1 through 5 corresponding to the five quintiles. Then rerun as follows:
                              Code:
                              oprobit cs_1_num ib5.post_quintile
                              margins, dydx(post_quintile)

                              Comment

                              Working...
                              X