Differences in differences cox regression

Paul Mueller

Join Date: Jan 2015

Posts: 55
#1

Differences in differences cox regression

14 Aug 2015, 07:56

Hi, I have some conceptional difficulties with running a DiD cox regression. The linear DiD model is just y=ß0+ß1*x1+ß2*x2*ß3*x1*x2, where ß3 is the difference of x1 in x2 (perhaps this expression is not entirely correct?). However, in cox regression my original approach to DiD was to calculate the hazard ratios separately for the treatment and control groups with bivariate cox regressions and to manually calculate the ratios of those two hazard ratios afterwards. My simple school math tells my that this is should result in the adjusted hazard ratios provided that the measured effect in the treatment group is a superposition of the overall time trend and the treatment effect.

However, if I then compare my results to a more strict application of the DiD approach where I put the indicator variables of group and time and the interaction term in one model (h=ß0+exp(ß1*time+ß2*group+ß3*time*group)), the estimates for exp(ß3) are somewhat similar to the first results but not nearly the same.

Can somebody please explain how this difference arises and which estimate is the correct one?

Last edited by Paul Mueller; 14 Aug 2015, 08:00.
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

14 Aug 2015, 12:52

Please, as the FAQ request, show us the commands for both models that you used and the results, putting them between CODE delimiters (FAQ section 13). While you are at it, explain the design and show your stset statement. What are x1 and x2? Your linear equation has a typo (the third "*" shoud be "+") and your interpretation of the DID parameter b3 ("difference of x1 in x2") conveys no meaning. This is the downside of describing results that the rest of us can't see.

For Cox regression, a DID for log hazard ratios must be translated to a ratio of hazard ratios. I find these difficult to interpretor present, because it is not easy to get predicted probabilities after stcox. I suggest a DID analysis of survival probabilities at particular points or of restricted mean survival time, using, respectively, the stpsurv and stpmean commands of Parner and Anderson (findit), described in their Stata Journal article.

Reference:
Parner, Erik T, and Per K Andersen. 2010. Regression analysis of censored data using pseudo-observations. Stata Journal 10, no. 3: 408,

available on p 408 of:
http://www.researchgate.net/profile/...0.pdf#page=102

Last edited by Steve Samuels; 14 Aug 2015, 12:59.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Paul Mueller

Join Date: Jan 2015

Posts: 55
#3

15 Aug 2015, 01:53

Hello Steve, thanks for your reply.

x1 is the qualification level (0=high-skilled, 1=low-skilled, since I'm interested in the low-skilled and the high-skilled are my control group) and x2 is a time dummy that changes at the point of time of an event. I would like to measure the effect of x2 for different sub-samples to find out for which groups there is an effect of the event. Since there is considerable variation over time I include high-skilled workers as a control group and adjust the measured change in my treatment group for the change in the control group (that's what DiD essentially means in my understanding).

As I already mentioned I both calculated models with interaction terms (as DiD regressions usually do) and ratios of the hazard ratios of the two groups, simply because I'm doing this for the first time and I wanted to feel my way. This are my results for one subsample (young, male, service-sector; I'm sorry, I don't have the original output available at the moment):

Code:

stset duration, failure(failed) stcox i.time_dummy i.skill_level i.time_dummy#i.skill_level if male==1 & service==1 & senior==0 Hazard ratios (stars mark significance levels):
time_dummy 0.498 ***

skill_level 1.606 ***

time_dummy*skill_level 1.337 ***

stcox i.time_dummy if male==1 & service==1 & senior==0 & skill_level==1 Hazard ratio:
time_dummy 0.706 ***

stcox i.time_dummy if male==1 & service==1 & senior==0 & skill_level==0 Hazard ratio:
time_dummy 0.489 ***

The ratio between the two hazard ratios below is 1.444. This is of course close to the estimate of the interaction term but nevertheless I need to understand how this is to be interpreted. Clearly, to use the interaction term is safer in terms of a valid application of the DiD approach. On the other hand, my own understanding of the difference in the difference can't avoid to look at the ratio of the two separately estimated hazard ratios.

Regarding your objection about using hazard ratios for DiD estimation: I'm explicitly interested in the average change and don't want to limit results to certain survival times. That's why I use overall hazard ratios and assume the pre- and post-event hazards to be proportional. I understand that this goes at the cost of a good interpretability of the DiD estimates but I prefer this to further limiting or fragmenting my results.

Last edited by Paul Mueller; 15 Aug 2015, 02:10.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

16 Aug 2015, 18:32

Thanks for formatting your results with CODE delimiters. I'll answer the original question below, but then comment on the actual data setup as you just described.

1. Your original question

Why are you seeing slightly different DIDs? The crucial data units in Cox regression are the "risk sets" at each failure time t: those who had not failed prior to t. In the combined regression, there is a risk set for each failure in the combined data set, and the denominator consists of observations in both groups. In the separate Cox regressions, the times may differ; and the risk sets in one regression have no connection with those in the other. The consequence is that the DIDs in the combined and separate models differ.

In symbols, let G be a 0-1 group indicator and P be a 0-1 indicator of "period" (no longer applicable here).

Model 1 (Combined)
y = log h(t| G, P) = a(t) + b1 G + b2 P + b3 G P
DID = b3

Model 2 (Separate models

Group = 0 y = a0(t) + c1 P
Group = 1 y = a1(t) + c2 P

DID = c2 - c1. But since c1 and c2 were estimated in different subgroups, they need not be identical to b3.

This is in contrast to ordinary LS regression, in which means and differences in means are the same whether
a combined DID momdel is run or two different models are run as you did.

Which is the correct model? Well, if you want to characterize the data, aside from the DID, then you need to check that the model is correct. Models 1 & 2 differ, for example, in that model 2 does not assume that the group effect is proportional. So, at a minimum, you'd need to check the proportional hazards assumptions

2. Setup and Analysis of a time-varying binary covariate that changes only once:

Your stset statement is for what is known as "single record data", with one data line per person. With time-varying covariates, you need "multiple record" data. At the minimum, for each person, you must have a record with x2= 0 at entry, then a record with x2=1 at the time, if any, that an event happens. Then you must stset the data with the id() option, followed, possibly, by stsplit , at(failures). See :http://www.statalist.org/forums/foru...es-please-help.

For graphical representation of the effects of such a covariate, you can compute what is known as the "Simon-Makuch" curve. See: http://www.stata.com/statalist/archi.../msg00809.html

With time-varying binary covariate, you have to ask if the effect depends on the time since it was "switched on". You can model this with a time-varying continuous covariate. See the first post I linked to for an example.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Paul Mueller

Join Date: Jan 2015

Posts: 55
#5

17 Aug 2015, 02:06

Thank you very much for the comprehensive explanation!

I had the suspicion that the difference has something to do with the estimation procedure but there was a missing link. I actually have issues with the proportional hazards condition for many subgroups but I decided to take the hazard ratios as average within-group effects. I understand that this makes proportional hazards rather a "nice to have" than a necessary condition for a valid analysis. In this case it would make more sense to calculate the hazard ratios separately and then compare them "manually", right? I would present the multivariate model in addition to show the significance of the interaction terms but would use the post-estimation differences as numerically correct estimates for the average effect (everage meaning in everage for all durations) of time_dummy.

Regarding your advice about splitting my data: I considered that before but decided that splitting would improve the data quality while not splitting them would still result in interpretable results. Would you strongly recommend that I split my data?
Comment
Paul Mueller

Join Date: Jan 2015

Posts: 55
#6

17 Aug 2015, 05:57

And maybe it would be a more elegant method to make it a multivariate model with stratification on groups?
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

17 Aug 2015, 19:01

To answer your question: stsplit can be valuable if you are doing a Cox analysis and the data set contains many followup records for an individual. You may need these records to create time-varying covariates, but you need the values of those covariates only at failures. So, it can reduce the size of analysis data sets. It's also useful if you want to examine the risk sets, as when there are convergence problems or you want to sample the risks sets. If you have neither of these situations, then there is no need to use stsplit. One side effect: stsplit will add observations to the data, and if you already have large data sets, the final size can be huge. Finally, if you want do a "flexible" survival analysis with time-varying predictors (i.e. with Patrick Royston's stpm or with Paul Lambert's stpm2, both from SSC), then you must not stsplit the data: you are likely to need values of the time-varying covariates at all followup times available.

About the pooled-data estimate as an "average". I think you are better off doing individual log HRs averaging those, then exponentiating. Suppose that someone wants to replicate your study and the hazard ratio for each subgroup in their data is identical to the HR in the same subgroup in your data, so they should get the same "average" as you, right? They won't, unless the relative sizes of their subgroups are the same as yours. So if the choice is "manual average" or pooling, I'd go with manual average.

About stratification on groups You obviously cannot test a "group" mean effect with that setup, but, what is not commonly known, you can add the interaction of group with event to the Cox model and test that.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Paul Mueller

Join Date: Jan 2015

Posts: 55
#8

20 Aug 2015, 06:22

Hi again,

apparently both methods have identical results if I use stratification on (qualification) groups in the multivariate model.

Code:

stset duration, failure(failed) stcox i.time_dummy i.skill_level i.time_dummy#i.skill_level if male==1 & service==1 & senior==0, strata(skill_level) Hazard ratios (stars mark significance levels):
time_dummy 0.489 ***

skill_level 1

time_dummy*skill_level 1,443 ***

On the one hand this is good news because it looks as if I can get correct estimates in one model that also includes a significance test for the interaction term. On the other hand, I get a little confused: Clearly this specification doesn't control for time-independent group differences, which I need. But shouldn't my "manual" approach (computing ratios of the hazard ratios of the two groups) do this? If I calculate the average treatment effect as HR1/HR2 there should be no effect of the time-independent group difference in my understanding.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

20 Aug 2015, 21:51

Hmm. I see that you misunderstand what a "stratified" Cox model is:

Code:

stcox i.time_dummy i.time_dummy#i.skill_level, stratum(i.skill_level).

This fits a separate hazard function for each stratum. Your stset statement still lacks the id() variable, needed when you have more than one observation per person.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Paul Mueller

Join Date: Jan 2015

Posts: 55
#10

21 Aug 2015, 04:19

Stratification fits separate functions for each stratum but estimates one set of coefficients for all strata, right? Doesn't that imply that the coefficients are fixed-effects (="within-strata") values? My question is: How does this model estimate an interaction effect between stata and the time dummy? An why are those estimates identical with the ratios between the separately estimated coefficients? If I understand correctly I just need a set of those stratified models for correct DID estimates. I'm just struggling with the exact reason why.

Regarding the id() option of stset: What's the use in my case? For my analysis it doesn't matter whether episodes belong to one individual or another. From stset's manual I understand that this option only matters when I want to stsplit my data. (If I have time left I will do this but right now I'm fine with the correct specification of my model without splitted spells).

Last edited by Paul Mueller; 21 Aug 2015, 04:22.
Comment
Paul Mueller

Join Date: Jan 2015

Posts: 55
#11

22 Aug 2015, 05:28

I also just discovered that I don't exactly know how stsplit should be of any use in my case. I thought about splitting spells at the point between the two periods so that I have spells that started in the first period but ended in the second would become part of my risk set in the first period. But I think that's not even possible with stsplit. However, stsplit, at(failure) wouldn't do that.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#12

22 Aug 2015, 14:30

You need multiple records for those with the event to indicate to Stata when during followup "exposure" to the event started. In the data below, exposure to the event for id 15 begins on day 28 and ends on day 100; there is no exposure before day 28.

Suppose patient id 15 entered the study (on day 1) and had an "event" on day 28 and exited alive on day 100; patient 16 had the event on day 30 and died on day 50; and patient 17 never had the event, but died on day 50. there is the input records for those three patients would look like. Compare to the examples on pp 386-387 of the Survival Manual (stset entry) and example 3 of the stcox entry. I'll call the event indicator "ever_event".

Code:

input id died ever_event stime 15 0 0 28 15 0 1 100 16 0 0 30 16 1 1 50 17 1 0 60 end

The stset and stcox statements would be:

Code:

stset stime, failed(died) id(id) stcox ever_event ever_event#skill_level stratum(skill_level)

You can also do the model-only stcox run or the two independent stcox runs, one per group. For the latter, you'll need to compute the CI for the DID by hand.

Last edited by Steve Samuels; 22 Aug 2015, 14:55.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Paul Mueller

Join Date: Jan 2015

Posts: 55
#13

25 Aug 2015, 01:42

Okay, now I got it. Thanks again!

One last question: Can you think of any practical way of testing for another level of differences? For example: Is the DID that I got so far different between male and female persons?

Of course I could try

Code:

stcox event event#qualification event#qualification#sex, strata(qualification,sex)

But that results in a matrix of a lot of group comparisons and I don't see how any of them gives me the necessary information.
Comment

Paul Mueller

Join Date: Jan 2015
Posts: 55

#14

25 Aug 2015, 08:38

To be more specific: The result with an additonal variable (differences in differences in differerences) looks like that (service being a dummy for industries).

Code:

stox event event#qualification#service, strata(qualification service)

[...]

--------------------------------------------------------------------------------------------
                        _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
                  1.event |   .7939818   .0035707   -51.30   0.000     .7870141    .8010112
                           |
event#qualification#service |
                    0 0 1  |   .9090457   .0046766   -18.54   0.000     .8999258     .918258
                    0 1 0  |   .9483852          .        .       .            .           .
                    0 1 1  |   .7835774   .0052978   -36.07   0.000     .7732625    .7940299
                    1 0 1  |   1.002048          .        .       .            .           .
                    1 1 0  |          1   .0091701     0.00   1.000     .9821875    1.018136
                    1 1 1  |    1.00382          .        .       .            .           .
--------------------------------------------------------------------------------------------
                                             Stratified by qualification service

Is the ratio between the reference category and the category 1-1-1 the effect I'm asking for (i. e. the effect of service on the effect of qualification on the effect of event)? And if so, why don't I get values for the SE and error probability?

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#15

25 Aug 2015, 15:35

You are much better off doing the DID analysis in the three strata you define and comparing the DIDs. Beyond that, I can't help you any further. Good luck!

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

time_dummy	0.498	***
skill_level	1.606	***
time_dummy*skill_level	1.337	***

time_dummy	0.489	***
skill_level	1
time_dummy*skill_level	1,443	***

Announcement

Differences in differences cox regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment