stcox for categorial variables

Michael Tricodur

Join Date: Jan 2015
Posts: 9

stcox for categorial variables

21 Sep 2015, 15:57

Dear all,

I´d like to perform a Cox-regression on a categorial variable (consclustlist coded as 1, 2, 3). I therefore use stcox with xi which works fine. I get the following output:

Code:

stset  os_months, failure(os_cens == 1)
xi: stcox i.consclustlist, vce(boot, rep(1000) seed(123))

i.consclustlist   _Iconsclust_1-3     (naturally coded; _Iconsclust_1 omitted)
(running stcox on estimation sample)

Bootstrap replications (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500
..................................................   550
..................................................   600
..................................................   650
..................................................   700
..................................................   750
..................................................   800
..................................................   850
..................................................   900
..................................................   950
..................................................  1000

Cox regression -- Breslow method for ties

No. of subjects =          129                     Number of obs   =       129
No. of failures =          115
Time at risk    =  1362.852457
                                                   Wald chi2(2)    =      6.87
Log likelihood  =   -461.99888                     Prob > chi2     =    0.0322

-------------------------------------------------------------------------------
              |   Observed   Bootstrap                         Normal-based
           _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
_Iconsclust_2 |   .6745748   .1558646    -1.70   0.088     .4288989    1.060976
_Iconsclust_3 |   1.430974   .4134675     1.24   0.215     .8122403    2.521036
-------------------------------------------------------------------------------

The overall p-value of the model is p= 0.0322. I was wondering how I have to interpret the lack of significance for the individual dummy variables (_Iconsclust_2, _Iconsclust_3)?

I also performed a simple logrank test for comparison which was significant for consclustlist as well as for the individual dummy variables (_Iconsclust_2, _Iconsclust_3).

Code:

. sts test   consclustlist, logrank

         failure _d:  os_cens == 1
   analysis time _t:  os_months


Log-rank test for equality of survivor functions

             |   Events         Events
consclustl~t |  observed       expected
-------------+-------------------------
1            |        58          55.38
2            |        29          40.89
3            |        28          18.73
-------------+-------------------------
Total        |       115         115.00

                   chi2(2) =       8.26
                   Pr>chi2 =     0.0161

. sts test  _Iconsclust_2, logrank

         failure _d:  os_cens == 1
   analysis time _t:  os_months


Log-rank test for equality of survivor functions

             |   Events         Events
_Iconsclus~2 |  observed       expected
-------------+-------------------------
0            |        86          74.11
1            |        29          40.89
-------------+-------------------------
Total        |       115         115.00

                   chi2(1) =       5.44
                   Pr>chi2 =     0.0197

. sts test  _Iconsclust_3, logrank

         failure _d:  os_cens == 1
   analysis time _t:  os_months


Log-rank test for equality of survivor functions

             |   Events         Events
_Iconsclus~3 |  observed       expected
-------------+-------------------------
0            |        87          96.27
1            |        28          18.73
-------------+-------------------------
Total        |       115         115.00

                   chi2(1) =       5.53
                   Pr>chi2 =     0.0187

.

Thank´s in advance for your help & suggestions!

Last edited by Michael Tricodur; 21 Sep 2015, 16:00.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#2

22 Sep 2015, 00:04

Michael:
-as far as categorical variables you should simply rely on -fvvarlist- capabilities, as -stcox- allows factor variables;
- the lack of significance for the individual dummy means that there's absence of evidence in terms of longer/shorter survival for _Iconsclust_2 and _Iconsclust_3 when contrasted against the reference category _Iconsclust_1;
- the statistical significances reported by log-rank for categories are possibly spurious, in that you have actually performed multiple comparisons. Having three different categories, the p<0.05 threshold should be adjusted as p<0,05/3=0.0167. When contrasted against the adjusted p-value=0.0167, the results of log-rank for individual dummy variables are no more statistical significant at the usual arbitrary threshold of 0.05.

Last edited by Carlo Lazzaro; 22 Sep 2015, 00:11.

Kind regards,
Carlo
(Stata 19.0)
Comment
Michael Tricodur

Join Date: Jan 2015

Posts: 9
#3

22 Sep 2015, 03:23

Dear Carlo,

Thank you for your answer!
Thus I can dismiss "xi:" since stcox can deal with factor variables i.e. its fine to just write "stcox i.consclustlist, vce(boot, rep(1000) seed(123))"
Regarding "....the statistical significances reported by log-rank for categories are possibly spurious..." - how can I be sure about that?
I am still confused why the p-values for the two dummy variables are not significant (p=0.088 and 0.215) whereas the overall p-value of the Cox model is significant (p=0.0322) and how I have to interpret this.

Kind regards,
Michael
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#4

22 Sep 2015, 04:26

Michael:
have you ruled out multicollinearity issue from your Cox regression?
You can check it yourself by typing -estat vce, corr- after -stcox-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Michael Tricodur

Join Date: Jan 2015
Posts: 9

22 Sep 2015, 04:33

Dear Carlo:

-estat vce, corr- gives me the following output:

Code:

. stcox i.consclustlist, vce(boot, rep(1000) seed(123))
(running stcox on estimation sample)

Bootstrap replications (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500
..................................................   550
..................................................   600
..................................................   650
..................................................   700
..................................................   750
..................................................   800
..................................................   850
..................................................   900
..................................................   950
..................................................  1000

Cox regression -- Breslow method for ties

No. of subjects =          129                     Number of obs   =       129
No. of failures =          115
Time at risk    =  1362.852457
                                                   Wald chi2(2)    =      6.87
Log likelihood  =   -461.99888                     Prob > chi2     =    0.0322

-------------------------------------------------------------------------------
              |   Observed   Bootstrap                         Normal-based
           _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
consclustlist |
           2  |   .6745748   .1558646    -1.70   0.088     .4288989    1.060976
           3  |   1.430974   .4134675     1.24   0.215     .8122403    2.521036
-------------------------------------------------------------------------------

. estat vce, corr

Correlation matrix of coefficients of cox model

             |        2.        3.
        e(V) | conscl~t  conscl~t
-------------+--------------------
2.consclus~t |   1.0000          
3.consclus~t |   0.3620    1.0000

Last edited by Michael Tricodur; 22 Sep 2015, 04:37.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#6

22 Sep 2015, 04:43

Michael:
there's no multicollinearity issue in your data.
I would guess that the lack of statistical significance in your coefficients is due to the fact that the three categories have a too different number of oservations each.

Kind regards,
Carlo
(Stata 19.0)
Comment
Michael Tricodur

Join Date: Jan 2015

Posts: 9
#7

22 Sep 2015, 05:15

Thank you Carlo.

Would it then be correct to conclude the following: Because the overall p-value of the model is significant I should not care too much about the insignificant p-values of the individual dummy variables?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#8

22 Sep 2015, 05:26

Michael:
I would rather focus on the fact that the lack of statistical significance of the coefficients is probably due to the quite different number of observations for categories 2 e 3 when contrasted against the reference one (i.e., 1). Put differently, the total number of observations is enough to obtain an overall p-value<0.05, but the unbalanced number of observations for each category makes the magic of p-value disappear as far as the coefficients are concerned.

Kind regards,
Carlo
(Stata 19.0)
Comment
Michael Tricodur

Join Date: Jan 2015

Posts: 9
#9

22 Sep 2015, 07:08

Carlo, the different number of observations was exactly the problem. If I switch the reference category (e.g. by switching 1 and 2) I get a significance for both coefficients as well as for the overall model.

Last edited by Michael Tricodur; 22 Sep 2015, 07:11.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#10

22 Sep 2015, 08:56

Michael:
happy with reading that the solution was more trivial than expected.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

stcox for categorial variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment