Hi all,
I am running a difference in difference (DID) regression on two different outcomes: (i) probability of doing a specialist visit and (ii) number of specialist visits in the last month. Outcome (i) is a binary variable whereas outcome (ii) is a count variable.
At this point I am confused about the correct way to implement the DID regressions on Stata.
For outcome (i) I could run on STATA a logit regression of the form:
1) logit probvis treatment time interaction covariates, robust
where 'interaction' represent the interactioin between a dummy for the treatment and a dummy for the time when the treatment of interest is active. In particular, the coefficient associated to treatment is the difference in difference effect that I am interested to.
Alternatively, I know about the command diff, which I would use in the following way:
2) diff probvis, t(treatment) p(time) cov(covariates) robust
In this case, the regressions results are displayed in a way that highlights the time differences and also the final difference in difference coefficients.
Similarly, I can perform the same regressions for outcome (ii) with the difference that, in this case, I should replace the command logit with an appropriate count data model.
Could someone please explain what is the difference between the above two methods? Am I right in thinking that with command 2) I am simply implementing a linear DID regression and I should therefore prefer model 1) for a binary outcome? Does the same type of argument apply to outcome (ii) ?
Thank you - please, feel free to correct any mistake that may appear in my post.
Magherita Neri
I am running a difference in difference (DID) regression on two different outcomes: (i) probability of doing a specialist visit and (ii) number of specialist visits in the last month. Outcome (i) is a binary variable whereas outcome (ii) is a count variable.
At this point I am confused about the correct way to implement the DID regressions on Stata.
For outcome (i) I could run on STATA a logit regression of the form:
1) logit probvis treatment time interaction covariates, robust
where 'interaction' represent the interactioin between a dummy for the treatment and a dummy for the time when the treatment of interest is active. In particular, the coefficient associated to treatment is the difference in difference effect that I am interested to.
Alternatively, I know about the command diff, which I would use in the following way:
2) diff probvis, t(treatment) p(time) cov(covariates) robust
In this case, the regressions results are displayed in a way that highlights the time differences and also the final difference in difference coefficients.
Similarly, I can perform the same regressions for outcome (ii) with the difference that, in this case, I should replace the command logit with an appropriate count data model.
Could someone please explain what is the difference between the above two methods? Am I right in thinking that with command 2) I am simply implementing a linear DID regression and I should therefore prefer model 1) for a binary outcome? Does the same type of argument apply to outcome (ii) ?
Thank you - please, feel free to correct any mistake that may appear in my post.
Magherita Neri
Comment