Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Package confreg - The confusion matrix estimated using regression models

    Thanks to Kit Baum, a new package -confreg- is now available on SSC.

    Description
    The command -confreg- estimates sensitivity and specificity for a single modality by OLS regressing the binary values from the modality on the pathology using robust variance estimation.
    The area under the ROC curve (AUC) is estimated here as the mean of sensitivity and specificity for the modality.
    There are non-linear formulas for estimating the PPV, NPV, and accuracy using prevalence, sensitivity, and specificity (Bland, 2015, subsection 20.6).

    To model more modalities, confreg stacks the values of each modality and the pathology and adds a categorical modality variable.
    Using the stacked dataset, sensitivity and specificity are estimated by regressing the modality values on the pathology values and the categorical modality variable, with robust variance estimation.
    If modalities are measured on the same patients, estimation uses random intercepts by ID.
    The AUC, PPV, NPV, and accuracy are estimated from the prevalence, sensitivity, and specificity as described.

    Examples (Stata example dataset)
    A reviewer classified 109 tomographic images using a 5-point scale, from 1 = definitely normal to 5 = definitely abnormal.
    Patients: 58 normal, 51 abnormal.

    Data must be in long format.
    Code:
    . webuse hanley, clear
    (Tomographic images)
    . generate id = _n
    . generate rating2 = rating >= 2
    . generate rating3 = rating >= 3
    . drop rating
    . reshape long rating, i(id) j(point)
    (j = 2 3)
    
    Data                               Wide   ->   Long
    -----------------------------------------------------------------------------
    Number of observations              109   ->   218         
    Number of variables                   5   ->   5           
    j variable (2 values)                     ->   point
    xij variables:
                            rating2 rating3   ->   rating
    -----------------------------------------------------------------------------
    
    . label define point 2 "rating 2" 3 "rating 3" 
    . label values point point
    Using -confreg- to report sensitivities, specificities, and AUCs at values 2 and 3:
    Code:
    . confreg disease rating point, id(id) vce(robust)
    
                                     |         N          p       [95%        CI] 
    ---------------------------------+-------------------------------------------
    rating 2                         |                                           
               Sensitivity, P(TP|C+) |        51   94.11765   87.63018   100.6051 
               Specificity, P(TN|C-) |        58   56.89655   44.09288   69.70022 
                  AUC, (sens+spec)/2 |       109    75.5071   68.33038   82.68382 
    ---------------------------------+-------------------------------------------
    rating 3                         |                                           
               Sensitivity, P(TP|C+) |        51   90.19608   81.99713   98.39503 
               Specificity, P(TN|C-) |        58   67.24138   55.10703   79.37573 
                  AUC, (sens+spec)/2 |       109   78.71873   71.39641   86.04104 
    (results _se_sp_auc are active now)
    Report all the accuracy measures:
    Code:
    . matlist r(confreg), tw(32)
    
                                     |         N          p       [95%        CI] 
    ---------------------------------+-------------------------------------------
                                     |                                           
                    Prevalence, C+/N |         .   .4678899          .          . 
    ---------------------------------+-------------------------------------------
    rating 2                         |                                           
               Sensitivity, P(TP|C+) |        51   94.11765   87.63018   100.6051 
               Specificity, P(TN|C-) |        58   56.89655   44.09288   69.70022 
                  AUC, (sens+spec)/2 |       109    75.5071   68.33038   82.68382 
                Accuracy, P(TP + TN) |        73   74.31193   66.85336   81.77049 
                       PPV, P(TP|P+) |        36   65.75342   58.88674    72.6201 
                       NPV, P(TN|P-) |       109   91.66667   83.06838    100.265 
    ---------------------------------+-------------------------------------------
    rating 3                         |                                           
               Sensitivity, P(TP|C+) |        51   90.19608   81.99713   98.39503 
               Specificity, P(TN|C-) |        58   67.24138   55.10703   79.37573 
                  AUC, (sens+spec)/2 |       109   78.71873   71.39641   86.04104 
                Accuracy, P(TP + TN) |        65   77.98165    70.4712    85.4921 
                       PPV, P(TP|P+) |        44   70.76923   62.87928   78.65918 
                       NPV, P(TN|P-) |       109   88.63636   80.01908   97.25365
    The sensitivities are highly correlated, likewise for the specificities.
    Note that sensitivities and specificities are uncorrelated.
    Code:
    . matlist r(se_sp_auc_corr), tw(32)
    
                                     | rating 2                        | rating 3                       
                                     | Sensiti~)  Specifi~)  AUC, (s~2 | Sensiti~)  Specifi~)  AUC, (s~2 
    ---------------------------------+---------------------------------+--------------------------------
    rating 2                         |                                 |                                
               Sensitivity, P(TP|C+) |         1          .          . |         .          .          . 
               Specificity, P(TN|C-) |         .          1          . |         .          .          . 
                  AUC, (sens+spec)/2 |  .4519802   .8920279          1 |         .          .          . 
    ---------------------------------+---------------------------------+--------------------------------
    rating 3                         |                                 |                                
               Sensitivity, P(TP|C+) |  .7582875          .    .342731 |         1          .          . 
               Specificity, P(TN|C-) |         .   .8019208   .7153357 |         .          1          . 
                  AUC, (sens+spec)/2 |  .4245351   .6644611   .7845994 |  .5598603    .828587          1

    Compare estimates:
    Code:
    . matlist r(confreg), tw(32)
    
                                     |         N          p       [95%        CI] 
    ---------------------------------+-------------------------------------------
                                     |                                           
                    Prevalence, C+/N |         .   .4678899          .          . 
    ---------------------------------+-------------------------------------------
    rating 2                         |                                           
               Sensitivity, P(TP|C+) |        51   94.11765   87.63018   100.6051 
               Specificity, P(TN|C-) |        58   56.89655   44.09288   69.70022 
                  AUC, (sens+spec)/2 |       109    75.5071   68.33038   82.68382 
                Accuracy, P(TP + TN) |        73   74.31193   66.85336   81.77049 
                       PPV, P(TP|P+) |        36   65.75342   58.88674    72.6201 
                       NPV, P(TN|P-) |       109   91.66667   83.06838    100.265 
    ---------------------------------+-------------------------------------------
    rating 3                         |                                           
               Sensitivity, P(TP|C+) |        51   90.19608   81.99713   98.39503 
               Specificity, P(TN|C-) |        58   67.24138   55.10703   79.37573 
                  AUC, (sens+spec)/2 |       109   78.71873   71.39641   86.04104 
                Accuracy, P(TP + TN) |        65   77.98165    70.4712    85.4921 
                       PPV, P(TP|P+) |        44   70.76923   62.87928   78.65918 
                       NPV, P(TN|P-) |       109   88.63636   80.01908   97.25365
    What if the population prevalance is 0.25 instead of the sample prevalence of .4678899?
    (Changes in Accuracy, PPV, and NPV)
    Code:
    . qui confreg disease rating point, id(id) vce(robust) prevalence(0.25)
    
    . matlist r(confreg), tw(32)
    
                                     |         N          p       [95%        CI] 
    ---------------------------------+-------------------------------------------
                                     |                                           
                    Prevalence, C+/N |         .        .25          .          . 
    ---------------------------------+-------------------------------------------
    rating 2                         |                                           
               Sensitivity, P(TP|C+) |        51   94.11765   87.63018   100.6051 
               Specificity, P(TN|C-) |        58   56.89655   44.09288   69.70022 
                  AUC, (sens+spec)/2 |       109    75.5071   68.33038   82.68382 
                Accuracy, P(TP + TN) |        73   66.20183   56.46307   75.94058 
                       PPV, P(TP|P+) |        36   42.12438   34.69007   49.55868 
                       NPV, P(TN|P-) |       109   96.66858   93.04368   100.2935 
    ---------------------------------+-------------------------------------------
    rating 3                         |                                           
               Sensitivity, P(TP|C+) |        51   90.19608   81.99713   98.39503 
               Specificity, P(TN|C-) |        58   67.24138   55.10703   79.37573 
                  AUC, (sens+spec)/2 |       109   78.71873   71.39641   86.04104 
                Accuracy, P(TP + TN) |        65   72.98005   63.65132   82.30879 
                       PPV, P(TP|P+) |        44    47.8565   38.33883   57.37417 
                       NPV, P(TN|P-) |       109   95.36519    91.5837   99.14668
    Enjoy
    Kind regards

    nhb

  • #2
    Thanks to Kit Baum, there is an update to the confreg. As seen above, the previously reported orders of N for Accuracy, PPV, and NPV were incorrect. A consequence of unconsciously copying code. Sorry.
    Last edited by Niels Henrik Bruun; 03 Mar 2026, 22:34.
    Kind regards

    nhb

    Comment

    Working...
    X