Categorical dependent variable in a difference-in-difference model

Karishma DSouza

Join Date: Oct 2016

Posts: 111
#1

Categorical dependent variable in a difference-in-difference model

22 Nov 2016, 17:24

Is it possible to run a difference-in-difference model if the dependent variable is categorical and the independent variables are either categorical/continuous/binary?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

22 Nov 2016, 20:36

Yes. For a dichotomous outcome you just use a logistic or probit model; the use of the interaction between treatment group and time is the same. For a polychotomous outcome you can use -mlogit-.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#3

23 Nov 2016, 02:20

Similarly you should be able to use ologit for ordinal outcomes.
Comment
Anand Sunny

Join Date: Feb 2021

Posts: 10
#4

13 Mar 2021, 11:52

I am trying to run DID using both Ordinal and Nominal variables in the estimation. How can I interpret the result? Specifically, if the estimation is based on multinomial logit and ordered logit, how should I interpret the coefficients?

In the results shown, gb_dummy = treatment indicator, time_recode = time indicator, gb_dummy#time_recode= interaction variable and v743a = outcome variable in mlogit case and v457 = outcome variable in ologit case.

Code:

mlogit v743a i.gb_dummy##i.time_recode, base(4)

Code:

ologit v457 i.gb_dummy##i.time_recode
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

13 Mar 2021, 12:24

Specifically, if the estimation is based on multinomial logit and ordered logit, how should I interpret the coefficients?

I wouldn't even try. It's very complicated and even experienced users get it wrong much of the time.

Instead, run the -margins- command after each of your regressions to get the probabilities of each outcome under each combination of gb_dummy and time_recode. And then graph them so you can see what is going on.

Code:

forvalues i = 1/4 { margins gb_dummy#time_recode, predict(outcome(`i')) marginsplot, name(outcome`i', replace) }

(It appears from your outputs that both outcome variables have four-levels. If I have that wrong, change the -forvalues- command accordingly.
Comment
Anand Sunny

Join Date: Feb 2021

Posts: 10
#6

15 Mar 2021, 17:08

Thank you for your response. As I understand, in diff-in-diff the coefficient of interaction term denotes the pure effect of the treatment. I tried interpreting the interaction term in mlogit case as follows;

In the group that received the treatment, the expected value of, 'respondent_alone' increased by .3628units,
'respondent_and_husband_partner' increased by .3553 units and 'someone_else' decreased by .1981 units
relative to the base outcome 'husband_partner_alone' while keeping the other variables constant.

Is it the right way to interpret? Expecting your valuable feedback before I can proceed with the margins as you suggested.

Last edited by Anand Sunny; 15 Mar 2021, 17:11.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

15 Mar 2021, 17:36

First, I don't see where those numbers you mention in #6 come from. Perhaps you can explain how you arrived at them.

Second, it is almost impossible to say anything from the coefficients of an -mlogit- about the impact of a change in any variable. First there is the fact that the coefficients are logarithms of probablity ratios, not themselves probabilities. Then there is the fact that the effect of a probability ratio depends on the starting probability. Then there is the fact that the probabilities across the outcome categories must sum to 1, so that you can see situations where a coefficient is negative for an outcome but its probability increases (or vice versa) because the decrease in some other category was even larger! I really never try to do this myself, and I've seen people with lots of experience and expertise get it wrong when they do.

The -margins- command makes it simple by showing you the actual predicted probabilities of each outcome at the specified values of your predictors. These are numbers that you can look at, understand, and interpret.
Comment
Anand Sunny

Join Date: Feb 2021

Posts: 10
#8

17 Mar 2021, 08:59

Sorry, I did a mistake in the interpretation. The coefficients are in log odds terms. I reported those directly. I used the margins commands as you suggested and got the following results. How should I interpret the same?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

17 Mar 2021, 10:30

These numbers interpret themselves. The outputs are showing you the predicted probability of being in each outcome, given the values of gb_dummy and time_recode. Do you have a more specific question?
Comment
Anand Sunny

Join Date: Feb 2021

Posts: 10
#10

21 Mar 2021, 16:29

Thank you for your response. If we take the case of the last margins table, is it correct to say that,

for the control group, the expected value of the outcome 'someone_else' was 5.67...% for the pre intervention period and the expected value of the outcome for the post intervention period was 2.1..%. i.e. a decrease of 5.67 - 2.1=3.57%

Similarly for the treatment group the expected value of the outcome 'someone_else' for the pre intervention period was 6.78..% and for the post intervention period was 1.63..%. i.e. a decrease of 6.78 - 1.63 = 5.15%. Does estimating the average marginal effects means the same? How can i modify the code to estimate the average marginal effects?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#11

21 Mar 2021, 17:10

Your interpretations in #10 are correct.

You state that you generated those results using the commands I suggested in #5. If that is true, to get average marginal effects of time (the differences you calculated in #10) you can do

Code:

forvalues i = 1/4 { margins gb_dummy, predict(outcome(`i')) dydx(time_recode) }

Added: And if you would like the difference in differences in the probability metric:

Code:

forvalues i = 1/4 { margins gb_dummy, predict(outcome(`i')) dydx(time_recode) pwcompare(effects) }

Last edited by Clyde Schechter; 21 Mar 2021, 17:16.
Comment
Anand Sunny

Join Date: Feb 2021

Posts: 10
#12

21 Mar 2021, 19:57

Thank you for your response. I used the codes suggested in #11 and got the above results. In the probability metric, the average difference in difference in the probability of the chosen outcome is -0.015894 or 1.5894%. This is the difference in differences estimate of the effect of the treatment or the true intervention effect of the treatment Is this interpretation correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#13

22 Mar 2021, 11:10

In the probability metric, the average difference in difference in the probability of the chosen outcome is -0.015894 or 1.5894%. This is the difference in differences estimate of the effect of the treatment or the true intervention effect of the treatment Is this interpretation correct?

Almost right. The DID in the probability is, indeed, -0.015894 (which I would round to -0.016 or eve -0.02). But that is not 1.5894%; it is 1.5894 percentage points. Absolute differences in figures that are percents are denominated in percentage points, not percents. When you say something changes by x%, that language refers to a multiplicative change, which is not what you have here.
Comment
Sagara Ann

Join Date: Aug 2024

Posts: 2
#14

11 Aug 2024, 12:11

Hi,

My paper uses a quintile Difference-in-Differences estimator with a categorical outcome variable. My specification is as follows:

oprobit cs_1_num post_q1 post_q2 post_q3 post_q4

Here cs_1_num takes three values: 1 = Better, 2 = Same and 3 = Wose. `post_q1' is the dummy which takes the value 1 for observations in the lowest quintile in the post period, and the other regressors are defined similarly.

What version of the margins command do I use to obtain the difference in difference in the probability of each outcome for each quintile? Would margins, dydx(post_q`i') subpop(post_q`i') post do the job?

Additionally, how do I generate an event study graph which plots the difference in difference estimate for each outcome and quintile combination across survey waves (defined by the variable wave_no).

Last edited by Sagara Ann; 11 Aug 2024, 12:15.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

11 Aug 2024, 14:31

You can't use the -margins- command because you did not use factor-variable notation in the regression itself. The post_q* variables are not useful. What you need instead is a single variable, let's call it post_quintile, that takes on the values 1 through 5 corresponding to the five quintiles. Then rerun as follows:

Code:

oprobit cs_1_num ib5.post_quintile margins, dydx(post_quintile)
Comment

Announcement

Categorical dependent variable in a difference-in-difference model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment