Issues with missing data, N, Cox models

Eduard López

Join Date: Dec 2014

Posts: 48
#16

24 Jan 2016, 15:22

Sorry for being so persistent with this guys, but it's really bugging me.

I tried a different route to check the number of observations and HR, namely, instead of running "drop if", running "keep if" with obviously the values reversed. Hence, I used in one case I used:

Code:

preserve keep if selforsalaried==2 keep if typecondition==1 gen evento1=. replace evento1=1 if conditiondays>0 & sex==1 & conditiondays!=. stset conditiondays if conditiondays<=550, failure(evento1==1) scale(1) xi: stcox i.year i.agegroup i.state i.industry i.contracttype i.incomegroup i.icd9, nolog restore

As expected, this gave me the same number of observations and HR as when I ran "drop if" with the values reversed, but not the same as when I ran the syntax I had been using all along, namely:

Code:

preserve gen evento1=. replace evento1=1 if conditiondays>0 & typecondition==1 & selforsalaried==2 & sex==1 & conditiongays!=. stset conditiondays if conditiondays<=550, failure(evento1==1) scale(1) xi: stcox i.year i.agegroup i.state i.industry i.contracttype i.incomegroup i.icd9, nolog restore

You might be asking "well, why the hell you're using a database that includes a group of people you don't want to study and a type of condition you're not interested in", but the first version of the paper did include that type of condition and that group of people. Now that I'm not interested in them, however, it seems to be messing up my attempt to include the number of observations.

So it's kind of like a catch 22... if I drop them I'm afraid I will be getting a different number of observations and HR in the Cox Model. But if I don't drop that type of condition and group of people, I'm afraid the number of observations I'm getting corresponds to the whole database, not to the Cox model based on the type of condition and group of people I'm interested in.

Is there any way that type of condition and group of people are affecting my Cox model when building it using the if clause when generating the failure event?

PS: I posted the same issue here, http://stats.stackexchange.com/quest...if-clause-or-k and here https://www.reddit.com/r/stata/comme...when_using_if/

Yes, I'm that desperate... apologies for testing your patience, because I'm sure I'm missing something simple.

Last edited by Eduard López; 24 Jan 2016, 15:35.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#17

24 Jan 2016, 15:30

Well in your -keep if- version, the failure event event01 is restricted to sex == 2, whereas in the second version, it's sex == 1. So I wouldn't expect them to be the same?
1 like
Comment
Eduard López

Join Date: Dec 2014

Posts: 48
#18

24 Jan 2016, 15:35

Sorry it was my mistake when transcribing it for this post (as I'm actually using Spanish categories I changed the name of the variables to English terms so it wouldn't be confusing to you guys), edited it.

In any case I run the model for both sexes, just changing the sex condition, but still get the same problem.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#19

24 Jan 2016, 20:41

I think I see the problem in #16. In the first version, only observations with typecondition = 1 are retained and analyzed. In the second version, all values of typecondition are represented. While you attempted to exclude them from the analysis by saying that you have to be typecondition = 1 to have a failure event, what you have actually achieved is including them with _d = 0, i.e., as censored observations. So your second analysis has a larger data sample and includes censored observations that are not present at all in the first.

Even if this happened to work in your case, it would be bad style, and setting yourself up to make mistakes, to include sample selection as part of the definition of failure event. It is more transparent to identify the analysis sample by using -if- or -in- clauses in the -stset- or -stcox- commands (or with -keep if-, -drop if- commands). That way, when you review your code later you don't have to look for the sample selection in a place (the failure event definition) where you generally wouldn't think to look for it.

One more thing: unless you are using an old version of Stata from before the introduction of factor variables, the xi: prefixes do nothing except clutter up your data set with a bunch of oddly named variables that you don't need for anything. Just omit the xi: prefixes and your commands will run the way you intend them to.
1 like
Comment
Eduard López

Join Date: Dec 2014

Posts: 48
#20

25 Jan 2016, 13:08

Thanks a lot Clyde. That makes sense now, will go with the "keep if" method or if clauses on stset or stcox.

Going back to what I was asking before: if I were to use complete case analysis, is there any way to compare the distribution of subjects included in the analysis with those that were excluded in order to check there is no selection bias happening?

Thanks a lot!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#21

25 Jan 2016, 13:31

So, after you run the cox regression, you can capture which observations were included and which were not in a variable:

Code:

gen byte included = e(sample)

Then you can contrast the values of the variables for the included observations with the excluded observations with commands like:

Code:

tab var included tabstat var, by(included) statistics(whatever)

etc.
1 like
Comment
Eduard López

Join Date: Dec 2014

Posts: 48
#22

25 Jan 2016, 14:23

Much appreciated!

So when I for example run:

Code:

tab sex included, m

The interpretation should be that those values under the 1 column were episodes including that value from the variable that were included in the model. Whereas those under the value 0 were excluded (due to having missing observations in any variable), right?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#23

25 Jan 2016, 14:35

Correct.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment