Predictive margins by age category after stcox seem intuitively correct, but are they statistically correct?

Dan Holman

Join Date: Apr 2016

Posts: 17
#1

Predictive margins by age category after stcox seem intuitively correct, but are they statistically correct?

23 May 2017, 08:57

Hi all

I have an stcox regression where the event is work exit. I am estimating the impact of diabetes, controlling for a set of confounders, and testing whether certain factors mediate the relationship between diabetes and work exit. The command is as follows:

Code:

stcox rdiabe##scworkj i.ragey_6cat ragender i.rcohort_e_waves, efron

rdiabe = diabetes yes/no, scworkj = feel supported at work yes/no, ragey_6cat = age in six categories, rcohort_e_waves = study cohort.

This gives a diabetes HR of 1.5 (P=.001), scworkj HR of .868 (P=.005) and the interaction term has a HR of .752 (P=.076).

I have tried plotting margins following this over age using the following command:

Code:

margins if firstobs, at(rdiabe=(0 1) scworkj=(0 1) ragey_6cat=(1 2 3 4 5 6)) marginsplot, xdimension(ragey_6cat) noci

Which gives:

This makes intuitive sense. The hazard is distinctly higher for those with diabetes and who don't feel supported at work.

However, I have since read that margins do not make sense with stcox because it depends on the baseline hazard and the purpose with stcox is to avoid this (see: http://www.statalist.org/forums/foru...ns-after-stcox).

Yet, it seems to me that the graph provides a reasonable picture of what is happening in the model.

A further issue. I wondered whether in fact, this should be modelled by producing a three way interaction between diabetes, support at work, and age, since intuitively the graph seems to be showing this relationship. If I run the model again with this three way interaction and graph the result, the following is given:

Is this in fact the more valid way of presenting what is going on?

Are margins even valid at all here? If not, what is the 'proper' way to represent what is going on in the model?

Any advice gratefully received.

Dan
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#2

23 May 2017, 23:23

If you notice, the -margins- output gives you relative hazards, so it is not dependent on the baseline hazard. It's perfectly reasonable to use these.

However, I don't understand the -if firstobs- clause in your -margins- command. What does firstobs mean, and why do you want to restrict the calculation to that subset? Also, your -margins- command probably be better written as:

Code:

margins rdiabe#scworkj, at(ragey_6cat = (1 2 3 4 5 6))

and testing whether certain factors mediate the relationship between diabetes and work exit

Your model has nothing to test any mediation in it. You do test whether the effect of diabetes is moderated by scworkj.

Is this in fact the more valid way of presenting what is going on?

It's not clear what you are asking here. If you are asking whether it is valid to use a three-way interaction for this research, that depends on your goals and your understanding of the underlying science. It's not a question with a purely statistical answer.

If you are asking whether you have properly analyzed the data for a model with a three way action, that, too, is unanswerable, but this time because you do not show the code you ran, so one can only imagine how you have actually done it.
Comment
Dan Holman

Join Date: Apr 2016

Posts: 17
#3

24 May 2017, 03:26

Hi Clyde

Thank you for your help, it is very much appreciated!

The firstobs follows the suggestion in Cleves, Gould, and Marchenko (2016), An Introduction to Survival Analysis Using Stata, p.311. These are multiple record data, so without taking the first observation for each person you do not account for different people having different numbers of observations in the sample. However the authors state that the covariates should be constant. In my case, there is a small amount of variation. I think I need to acknowledge this as an assumption and limitation, as I can't see a way round it to visualise the effects in the way I have done.

Thanks for the reminder that the interaction tests moderation. Initially I tested for mediation by including only diabetes and then adding the support variable but this did not reduce the diabetes coefficient so there is seems a moderation effect is more plausible.

Sorry yes the code for the three way interaction is

Code:

stcox rdiabe##scwork##i.ragey_6cat ragender i.rcohort_e_waves, efron

I guess I ask the question because the first graph looks like it is showing a three way interaction, i.e. how the interaction of diabetes and support varies over age. But when modelled statistically as with the above code, the second graph is produced. I guess my question boils down to, what is the correct way to show how this interaction varies over age, is it with the first or second graph? Which research questions would correspond to which model?

Best
Dan
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#4

24 May 2017, 08:51

OK, thanks for clarifying. I get the -firstobs- thing now.

The three way interaction code looks correct. The first graph is not showing a three way interaction. It's showing the predicted relative hazards at various ages, but the only interaction effect in there is between rdiabe and scwork. The reason it looks like there is also an interaction with age is that the Cox model is non-linear. (The hazard ratio is connected to the predictors with a log-link.) With non-linear models, every effect automatically varies with the values of every other variable in the model. In effect, non-linear models incorporate some degree of apparent interaction among all the variables. The inclusion of explicit interaction terms in the model enables you to model interaction in the log hazard metric. That's a different species of interaction, and is more substantive. So if you want to exhibit the most substantive three way interaction among diabetes, social support, and age, your second graph is the way to do that. The apparent age-based apparent interaction that is an artifact of non-linearity is all that the first graph shows and is not usually of substantive interest.
1 like
Comment
Dan Holman

Join Date: Apr 2016

Posts: 17
#5

24 May 2017, 09:32

A ha! That is exactly what I wanted to know Clyde, and thanks once more for sharing your wisdom. Dan.
Comment

Announcement

Predictive margins by age category after stcox seem intuitively correct, but are they statistically correct?

Comment

Comment

Comment

Comment