Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stcox for categorial variables

    Dear all,

    I´d like to perform a Cox-regression on a categorial variable (consclustlist coded as 1, 2, 3). I therefore use stcox with xi which works fine. I get the following output:

    Code:
    stset  os_months, failure(os_cens == 1)
    xi: stcox i.consclustlist, vce(boot, rep(1000) seed(123))
    
    i.consclustlist   _Iconsclust_1-3     (naturally coded; _Iconsclust_1 omitted)
    (running stcox on estimation sample)
    
    Bootstrap replications (1000)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    ..................................................    50
    ..................................................   100
    ..................................................   150
    ..................................................   200
    ..................................................   250
    ..................................................   300
    ..................................................   350
    ..................................................   400
    ..................................................   450
    ..................................................   500
    ..................................................   550
    ..................................................   600
    ..................................................   650
    ..................................................   700
    ..................................................   750
    ..................................................   800
    ..................................................   850
    ..................................................   900
    ..................................................   950
    ..................................................  1000
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =          129                     Number of obs   =       129
    No. of failures =          115
    Time at risk    =  1362.852457
                                                       Wald chi2(2)    =      6.87
    Log likelihood  =   -461.99888                     Prob > chi2     =    0.0322
    
    -------------------------------------------------------------------------------
                  |   Observed   Bootstrap                         Normal-based
               _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    _Iconsclust_2 |   .6745748   .1558646    -1.70   0.088     .4288989    1.060976
    _Iconsclust_3 |   1.430974   .4134675     1.24   0.215     .8122403    2.521036
    -------------------------------------------------------------------------------
    The overall p-value of the model is p= 0.0322. I was wondering how I have to interpret the lack of significance for the individual dummy variables (_Iconsclust_2, _Iconsclust_3)?

    I also performed a simple logrank test for comparison which was significant for consclustlist as well as for the individual dummy variables (_Iconsclust_2, _Iconsclust_3).

    Code:
    . sts test   consclustlist, logrank
    
             failure _d:  os_cens == 1
       analysis time _t:  os_months
    
    
    Log-rank test for equality of survivor functions
    
                 |   Events         Events
    consclustl~t |  observed       expected
    -------------+-------------------------
    1            |        58          55.38
    2            |        29          40.89
    3            |        28          18.73
    -------------+-------------------------
    Total        |       115         115.00
    
                       chi2(2) =       8.26
                       Pr>chi2 =     0.0161
    
    . sts test  _Iconsclust_2, logrank
    
             failure _d:  os_cens == 1
       analysis time _t:  os_months
    
    
    Log-rank test for equality of survivor functions
    
                 |   Events         Events
    _Iconsclus~2 |  observed       expected
    -------------+-------------------------
    0            |        86          74.11
    1            |        29          40.89
    -------------+-------------------------
    Total        |       115         115.00
    
                       chi2(1) =       5.44
                       Pr>chi2 =     0.0197
    
    . sts test  _Iconsclust_3, logrank
    
             failure _d:  os_cens == 1
       analysis time _t:  os_months
    
    
    Log-rank test for equality of survivor functions
    
                 |   Events         Events
    _Iconsclus~3 |  observed       expected
    -------------+-------------------------
    0            |        87          96.27
    1            |        28          18.73
    -------------+-------------------------
    Total        |       115         115.00
    
                       chi2(1) =       5.53
                       Pr>chi2 =     0.0187
    
    .
    Thank´s in advance for your help & suggestions!
    Last edited by Michael Tricodur; 21 Sep 2015, 16:00.

  • #2
    Michael:
    -as far as categorical variables you should simply rely on -fvvarlist- capabilities, as -stcox- allows factor variables;
    - the lack of significance for the individual dummy means that there's absence of evidence in terms of longer/shorter survival for _Iconsclust_2 and _Iconsclust_3 when contrasted against the reference category _Iconsclust_1;
    - the statistical significances reported by log-rank for categories are possibly spurious, in that you have actually performed multiple comparisons. Having three different categories, the p<0.05 threshold should be adjusted as p<0,05/3=0.0167. When contrasted against the adjusted p-value=0.0167, the results of log-rank for individual dummy variables are no more statistical significant at the usual arbitrary threshold of 0.05.
    Last edited by Carlo Lazzaro; 22 Sep 2015, 00:11.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      Thank you for your answer!
      Thus I can dismiss "xi:" since stcox can deal with factor variables i.e. its fine to just write "stcox i.consclustlist, vce(boot, rep(1000) seed(123))"
      Regarding "....the statistical significances reported by log-rank for categories are possibly spurious..." - how can I be sure about that?
      I am still confused why the p-values for the two dummy variables are not significant (p=0.088 and 0.215) whereas the overall p-value of the Cox model is significant (p=0.0322) and how I have to interpret this.


      Kind regards,
      Michael

      Comment


      • #4
        Michael:
        have you ruled out multicollinearity issue from your Cox regression?
        You can check it yourself by typing -estat vce, corr- after -stcox-.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo:

          -estat vce, corr- gives me the following output:

          Code:
          . stcox i.consclustlist, vce(boot, rep(1000) seed(123))
          (running stcox on estimation sample)
          
          Bootstrap replications (1000)
          ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
          ..................................................    50
          ..................................................   100
          ..................................................   150
          ..................................................   200
          ..................................................   250
          ..................................................   300
          ..................................................   350
          ..................................................   400
          ..................................................   450
          ..................................................   500
          ..................................................   550
          ..................................................   600
          ..................................................   650
          ..................................................   700
          ..................................................   750
          ..................................................   800
          ..................................................   850
          ..................................................   900
          ..................................................   950
          ..................................................  1000
          
          Cox regression -- Breslow method for ties
          
          No. of subjects =          129                     Number of obs   =       129
          No. of failures =          115
          Time at risk    =  1362.852457
                                                             Wald chi2(2)    =      6.87
          Log likelihood  =   -461.99888                     Prob > chi2     =    0.0322
          
          -------------------------------------------------------------------------------
                        |   Observed   Bootstrap                         Normal-based
                     _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
          --------------+----------------------------------------------------------------
          consclustlist |
                     2  |   .6745748   .1558646    -1.70   0.088     .4288989    1.060976
                     3  |   1.430974   .4134675     1.24   0.215     .8122403    2.521036
          -------------------------------------------------------------------------------
          
          . estat vce, corr
          
          Correlation matrix of coefficients of cox model
          
                       |        2.        3.
                  e(V) | conscl~t  conscl~t
          -------------+--------------------
          2.consclus~t |   1.0000          
          3.consclus~t |   0.3620    1.0000
          Last edited by Michael Tricodur; 22 Sep 2015, 04:37.

          Comment


          • #6
            Michael:
            there's no multicollinearity issue in your data.
            I would guess that the lack of statistical significance in your coefficients is due to the fact that the three categories have a too different number of oservations each.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Thank you Carlo.

              Would it then be correct to conclude the following: Because the overall p-value of the model is significant I should not care too much about the insignificant p-values of the individual dummy variables?

              Comment


              • #8
                Michael:
                I would rather focus on the fact that the lack of statistical significance of the coefficients is probably due to the quite different number of observations for categories 2 e 3 when contrasted against the reference one (i.e., 1). Put differently, the total number of observations is enough to obtain an overall p-value<0.05, but the unbalanced number of observations for each category makes the magic of p-value disappear as far as the coefficients are concerned.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Carlo, the different number of observations was exactly the problem. If I switch the reference category (e.g. by switching 1 and 2) I get a significance for both coefficients as well as for the overall model.
                  Last edited by Michael Tricodur; 22 Sep 2015, 07:11.

                  Comment


                  • #10
                    Michael:
                    happy with reading that the solution was more trivial than expected.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment

                    Working...
                    X