Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interviewing Candidates for a Job

    Non statistician here. So if what I am asking is ridiculous, you have my permission to call me out.

    My hypothesis is that interview process for a medical residency position are not helpful at distinguishing candidates- and I want to explain to a group of clinical educators why this is the case. We routinely interview 40 people every year for 4 positions. Each interviewer scores the applicant 1-10 (These are medical students for a residency position, they are chosen by a ranking system. So small changes on how we rank make a large difference to the student applicant. The interview score, nationwide, is usually highly weighted part of the ranking). I want to show that because of the poor reliability of scoring, we should not weigh this heavily. Our data clearly shows that the IRR is low 0.25-0.35. Probably because there are systemic problems in our interview process (unbalanced # of interviews and the entire pool of interviewers do not interview all the candidates. Also, the scores are skewed and there is a lot of clustering at top scores). And the data has heteroskedasticity (probably because low scores had more variability.) Each interviewer also seems to have different variance and some are statistically more lenient than others.

    Seeing some of these above problems, in the second year, I made all the interviewers interview everyone and I only picked interviewers who were not overly lenient. I was hoping to show that understanding and correcting for these statistical biases improves reliability (by using ICC).

    I wanted to first show that using raw means cannot discriminate between applicants (this seems obvious just looking at the box plots). In AY2022, Coger's kappa was only 0.28. I then used a meologit with the applicant a random effect. I hoped that I can used the coefficient of the fixed effects of the applicant as a corrected measure of interview score (while controlling for the evaluator), instead of a raw mean score.

    My questions;
    Am I full of crap? Did I make something more complicated than I needed. I am trying to use this as an educational tool to others on the pitfalls of this employment interviews and how to analyze them correctly and correct for biases.
    Is this an appropriate use of a mixed model. (I thought of using a metobit model, but i think the meologit makes more sense.) ANOVA will not work, especially in AY22, because we had unequal groups.
    Is there a way to list the number of statistically distinct categories of the scoring method? I think that is something easily understandable - "Statistically we only have X distinguishing categories for 40 candidates in Academic year 2022, and Y in AY 23"
    Does it make sense to use the fix coefficients of the applicants as a "corrected" score.

    Uploaded are the databases for 2022 and 2023 scores.

    ***AY2022
    meologit OVERALLSCORE i.evaluator_id i.candidate_id ||candidate_id: , vce(r) cov(un)
    Mixed-effects ologit regression Number of obs = 236
    Group variable: candidate_id Number of groups = 40

    Obs per group:
    min = 4
    avg = 5.9
    max = 6

    Integration method: mvaghermite Integration pts. = 7

    Wald chi2(16) = .
    Log pseudolikelihood = -406.202 Prob > chi2 = .
    (Std. err. adjusted for 40 clusters in candidate_id)
    ------------------------------------------------------------------------------
    | Robust
    OVERALLSCORE | Coefficient std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    evaluator_id |
    2 | .899008 .7403409 1.21 0.225 -.5520336 2.350049
    3 | -.2738794 .7361285 -0.37 0.710 -1.716665 1.168906
    4 | 2.548347 .9080481 2.81 0.005 .7686056 4.328089
    5 | .1197076 .6667587 0.18 0.858 -1.187115 1.426531
    6 | -.575595 .9599034 -0.60 0.549 -2.456971 1.305781
    7 | .1540699 .7040065 0.22 0.827 -1.225758 1.533897
    8 | 1.581396 .6575229 2.41 0.016 .2926749 2.870117
    |
    candidate_id |
    2 | -.5524405 .1384652 -3.99 0.000 -.8238272 -.2810537
    3 | 4.237737 .2971043 14.26 0.000 3.655423 4.820051
    4 | 1.959876 .2339169 8.38 0.000 1.501408 2.418345
    5 | 4.285109 .2369932 18.08 0.000 3.820611 4.749608
    6 | 3.117948 .215282 14.48 0.000 2.696003 3.539893
    7 | 5.354264 .3195147 16.76 0.000 4.728027 5.980502
    8 | 2.593385 .2106314 12.31 0.000 2.180555 3.006215
    9 | 1.982455 .2659333 7.45 0.000 1.461236 2.503675
    10 | .6370161 .1097089 5.81 0.000 .4219906 .8520416
    11 | .1589699 .1756929 0.90 0.366 -.1853818 .5033217
    12 | .4914535 .1159634 4.24 0.000 .2641694 .7187377
    13 | 4.838748 .3188922 15.17 0.000 4.213731 5.463765
    14 | 3.11466 .2035036 15.31 0.000 2.7158 3.51352
    15 | 1.988984 .2266827 8.77 0.000 1.544694 2.433274
    16 | 4.849235 .3347594 14.49 0.000 4.193119 5.505352
    17 | 1.618268 .1177063 13.75 0.000 1.387568 1.848968
    18 | 3.720724 .2758492 13.49 0.000 3.18007 4.261379
    19 | 3.289512 .2336927 14.08 0.000 2.831483 3.747542
    20 | 1.74793 .1634167 10.70 0.000 1.427639 2.068221
    21 | 5.999918 .3439894 17.44 0.000 5.325711 6.674125
    22 | 3.409479 .1964246 17.36 0.000 3.024494 3.794464
    23 | 4.040119 .2660433 15.19 0.000 3.518684 4.561554
    24 | 3.817888 .2936838 13.00 0.000 3.242278 4.393497
    25 | 7.448121 .3739559 19.92 0.000 6.715181 8.181061
    26 | 2.661355 .2681447 9.93 0.000 2.135801 3.186909
    27 | 1.524864 .1806956 8.44 0.000 1.170707 1.879021
    28 | 2.023749 .264241 7.66 0.000 1.505846 2.541652
    29 | 3.145881 .2778922 11.32 0.000 2.601222 3.69054
    30 | 3.103646 .2720242 11.41 0.000 2.570488 3.636804
    31 | 4.803786 .345465 13.91 0.000 4.126687 5.480885
    32 | 2.382988 .2105832 11.32 0.000 1.970253 2.795724
    33 | 3.722913 .2719621 13.69 0.000 3.189877 4.255949
    34 | 3.272464 .2617664 12.50 0.000 2.759411 3.785517
    35 | 4.04141 .2446491 16.52 0.000 3.561906 4.520913
    36 | 2.804329 .3240327 8.65 0.000 2.169236 3.439421
    37 | .9424762 .0932577 10.11 0.000 .7596945 1.125258
    38 | 3.293601 .2321937 14.18 0.000 2.83851 3.748692
    39 | -.7052869 .183582 -3.84 0.000 -1.065101 -.3454729
    40 | 3.04541 .2681207 11.36 0.000 2.519903 3.570917
    -------------+----------------------------------------------------------------
    /cut1 | -1.480647 .7245819 -2.900801 -.0604925
    /cut2 | -.8894804 .7173751 -2.29551 .516549
    /cut3 | -.3795946 .6970253 -1.745739 .9865499
    /cut4 | .0330022 .6857886 -1.311119 1.377123
    /cut5 | 1.237011 .683864 -.103338 2.57736
    /cut6 | 2.399001 .7103179 1.006803 3.791198
    /cut7 | 3.379394 .7090018 1.989776 4.769012
    /cut8 | 4.380659 .7463816 2.917778 5.843541
    /cut9 | 5.815383 .7222603 4.399779 7.230988
    -------------+----------------------------------------------------------------
    candidate_id |
    var(_cons)| 1.22e-33 5.34e-34 5.18e-34 2.88e-33
    ------------------------------------------------------------------------------



    AY23

    meologit Score i.SurgeonID i.StudentID || StudentID: , vce(r) cov(un)
    Mixed-effects ologit regression Number of obs = 160
    Group variable: StudentID Number of groups = 40

    Obs per group:
    min = 4
    avg = 4.0
    max = 4

    Integration method: mvaghermite Integration pts. = 7

    Wald chi2(13) = .
    Log pseudolikelihood = -260.06804 Prob > chi2 = .
    (Std. err. adjusted for 40 clusters in StudentID)
    ------------------------------------------------------------------------------
    | Robust
    Score | Coefficient std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    SurgeonID |
    2 | -.2331704 .4329376 -0.54 0.590 -1.081712 .6153718
    3 | .2070364 .4610952 0.45 0.653 -.6966936 1.110766
    4 | -.4671865 .5548703 -0.84 0.400 -1.554712 .6203393
    |
    StudentID |
    2 | 3.091381 .3273527 9.44 0.000 2.449781 3.73298
    3 | 2.282311 .2726246 8.37 0.000 1.747976 2.816645
    4 | 2.516177 .3116528 8.07 0.000 1.905348 3.127005
    5 | 2.827693 .3013976 9.38 0.000 2.236964 3.418421
    6 | 3.832582 .4144127 9.25 0.000 3.020348 4.644816
    7 | 2.369511 .3075211 7.71 0.000 1.766781 2.972242
    8 | 2.50721 .2681125 9.35 0.000 1.981719 3.032701
    9 | 6.311666 .6035483 10.46 0.000 5.128733 7.494599
    10 | 1.607775 .2001366 8.03 0.000 1.215515 2.000036
    11 | .9208777 .1234015 7.46 0.000 .6790152 1.16274
    12 | 3.770999 .4267267 8.84 0.000 2.93463 4.607368
    13 | 1.329507 .2684374 4.95 0.000 .8033795 1.855635
    14 | -2.8562 .3572448 -8.00 0.000 -3.556387 -2.156013
    15 | 2.156988 .21854 9.87 0.000 1.728658 2.585319
    16 | 1.148029 .1482822 7.74 0.000 .8574014 1.438657
    17 | 2.115842 .2533469 8.35 0.000 1.619292 2.612393
    18 | 1.761543 .2169952 8.12 0.000 1.33624 2.186846
    19 | 4.568019 .4781002 9.55 0.000 3.630959 5.505078
    20 | 2.116829 .2591079 8.17 0.000 1.608987 2.624671
    21 | .3541152 .1232558 2.87 0.004 .1125383 .595692
    22 | 2.7769 .3293046 8.43 0.000 2.131475 3.422325
    23 | 1.76258 .2105154 8.37 0.000 1.349977 2.175182
    24 | .1269294 .1087379 1.17 0.243 -.086193 .3400518
    25 | .8874248 .1327275 6.69 0.000 .6272837 1.147566
    26 | 5.464097 .553735 9.87 0.000 4.378796 6.549397
    27 | 1.872128 .2263589 8.27 0.000 1.428472 2.315783
    28 | 1.898707 .2178189 8.72 0.000 1.471789 2.325624
    29 | 2.13649 .2670036 8.00 0.000 1.613173 2.659807
    30 | -.5222354 .1636605 -3.19 0.001 -.8430041 -.2014667
    31 | -1.399335 .2407365 -5.81 0.000 -1.87117 -.9274998
    32 | -.7674789 .1830605 -4.19 0.000 -1.126271 -.4086869
    33 | 4.413593 .4777482 9.24 0.000 3.477224 5.349962
    34 | -.4893793 .1747552 -2.80 0.005 -.8318932 -.1468655
    35 | 2.843556 .3412455 8.33 0.000 2.174727 3.512385
    36 | 1.219798 .1714068 7.12 0.000 .8838467 1.555749
    37 | 2.772394 .3237463 8.56 0.000 2.137863 3.406925
    38 | 4.555018 .4708126 9.67 0.000 3.632242 5.477793
    39 | 5.421951 .5180434 10.47 0.000 4.406604 6.437297
    40 | 3.525434 .3880814 9.08 0.000 2.764808 4.286059
    -------------+----------------------------------------------------------------
    /cut1 | -4.932303 .8493545 -6.597007 -3.267599
    /cut2 | -3.632993 .8563627 -5.311433 -1.954553
    /cut3 | -2.418725 .6167812 -3.627594 -1.209856
    /cut4 | -1.114382 .3078888 -1.717834 -.5109315
    /cut5 | .2712292 .2439076 -.206821 .7492794
    /cut6 | 1.265936 .2727214 .7314117 1.80046
    /cut7 | 2.605524 .3886421 1.8438 3.367249
    /cut8 | 4.184127 .5424499 3.120945 5.24731
    /cut9 | 6.393414 .6457801 5.127708 7.65912
    -------------+----------------------------------------------------------------
    StudentID |
    var(_cons)| 3.22e-34 2.22e-34 8.36e-35 1.24e-33
    ------------------------------------------------------------------------------


    Attached Files

  • #2
    Your analysis is incorrect. By including the fixed effect of the student, you have removed from the random effects part of the model most of the variance that the student random effects might explain. So it is no surprise that your student-level variance components are, for practical purposes zero. With this analysis, even a very good rating system would appear useless.

    Here's a simpler illustration of how this goes wrong:
    Code:
    . webuse rate2no3, clear
    
    .
    . kap ratera raterb
    
                 Expected
    Agreement   agreement     Kappa   Std. err.         Z      Prob>Z
    -----------------------------------------------------------------
      67.31%      42.16%     0.4348     0.1010       4.31      0.0000
    
    .
    . rename rater* rating#, addnumber
    
    . reshape long rating, i(id) j(rater)
    (j = 1 2)
    
    Data                               Wide   ->   Long
    -----------------------------------------------------------------------------
    Number of observations               52   ->   104         
    Number of variables                   3   ->   3           
    j variable (2 values)                     ->   rater
    xij variables:
                            rating1 rating2   ->   rating
    -----------------------------------------------------------------------------
    
    .
    . mixed rating i.rater || id:
    
    Performing EM optimization ...
    
    Performing gradient-based optimization:
    Iteration 0:   log likelihood = -160.17176  
    Iteration 1:   log likelihood = -160.17176  
    
    Computing standard errors ...
    
    Mixed-effects ML regression                     Number of obs     =        104
    Group variable: id                              Number of groups  =         52
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =        2.0
                                                                  max =          2
                                                    Wald chi2(1)      =       1.36
    Log likelihood = -160.17176                     Prob > chi2       =     0.2431
    
    ------------------------------------------------------------------------------
          rating | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         2.rater |   .1730769    .148267     1.17   0.243     -.117521    .4636749
           _cons |   2.826923   .1811383    15.61   0.000     2.471899    3.181948
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
    -----------------------------+------------------------------------------------
    id: Identity                 |
                      var(_cons) |   1.134616   .2841448      .6945126    1.853606
    -----------------------------+------------------------------------------------
                   var(Residual) |   .5715605   .1120922      .3891597    .8394533
    ------------------------------------------------------------------------------
    LR test vs. linear model: chibar2(01) = 30.36         Prob >= chibar2 = 0.0000
    
    . estat icc
    
    Residual intraclass correlation
    
    ------------------------------------------------------------------------------
                           Level |        ICC   Std. err.     [95% conf. interval]
    -----------------------------+------------------------------------------------
                              id |    .665005   .0773486      .5012913    .7967644
    ------------------------------------------------------------------------------
    
    .
    . mixed rating i.rater i.id || id:
    
    Performing EM optimization ...
    
    Performing gradient-based optimization:
    Iteration 0:   log likelihood = -83.697849  
    Iteration 1:   log likelihood = -82.453952  
    Iteration 2:   log likelihood = -82.437951  
    Iteration 3:   log likelihood = -82.437951  
    
    Computing standard errors ...
    
    Mixed-effects ML regression                     Number of obs     =        104
    Group variable: id                              Number of groups  =         52
                                                    Obs per group:
                                                                  min =          2
                                                                  avg =        2.0
                                                                  max =          2
                                                    Wald chi2(52)     =     519.63
    Log likelihood = -82.437951                     Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
          rating | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         2.rater |   .1730769   .1048406     1.65   0.099    -.0324069    .3785607
                 |
              id |
              2  |  -2.25e-15   .5345843    -0.00   1.000    -1.047766    1.047766
              3  |          0   .5345843     0.00   1.000    -1.047766    1.047766
              4  |   1.59e-15   .5345843     0.00   1.000    -1.047766    1.047766
              5  |   3.17e-15   .5345843     0.00   1.000    -1.047766    1.047766
              6  |   1.27e-15   .5345843     0.00   1.000    -1.047766    1.047766
              7  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
              8  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
              9  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
             10  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
             11  |        1.5   .5345843     2.81   0.005     .4522341    2.547766
             12  |        1.5   .5345843     2.81   0.005     .4522341    2.547766
             13  |        1.5   .5345843     2.81   0.005     .4522341    2.547766
             14  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
             15  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
             16  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
             17  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
             18  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
             19  |          1   .5345843     1.87   0.061    -.0477659    2.047766
             20  |          1   .5345843     1.87   0.061    -.0477659    2.047766
             21  |          1   .5345843     1.87   0.061    -.0477659    2.047766
             22  |          2   .5345843     3.74   0.000     .9522341    3.047766
             23  |          2   .5345843     3.74   0.000     .9522341    3.047766
             24  |          2   .5345843     3.74   0.000     .9522341    3.047766
             25  |        1.5   .5345843     2.81   0.005     .4522341    2.547766
             26  |          2   .5345843     3.74   0.000     .9522341    3.047766
             27  |          3   .5345843     5.61   0.000     1.952234    4.047766
             28  |          3   .5345843     5.61   0.000     1.952234    4.047766
             29  |          3   .5345843     5.61   0.000     1.952234    4.047766
             30  |          3   .5345843     5.61   0.000     1.952234    4.047766
             31  |          3   .5345843     5.61   0.000     1.952234    4.047766
             32  |          3   .5345843     5.61   0.000     1.952234    4.047766
             33  |          3   .5345843     5.61   0.000     1.952234    4.047766
             34  |          3   .5345843     5.61   0.000     1.952234    4.047766
             35  |          3   .5345843     5.61   0.000     1.952234    4.047766
             36  |          3   .5345843     5.61   0.000     1.952234    4.047766
             37  |          3   .5345843     5.61   0.000     1.952234    4.047766
             38  |          3   .5345843     5.61   0.000     1.952234    4.047766
             39  |          3   .5345843     5.61   0.000     1.952234    4.047766
             40  |          3   .5345843     5.61   0.000     1.952234    4.047766
             41  |          3   .5345843     5.61   0.000     1.952234    4.047766
             42  |          3   .5345843     5.61   0.000     1.952234    4.047766
             43  |          3   .5345843     5.61   0.000     1.952234    4.047766
             44  |          3   .5345843     5.61   0.000     1.952234    4.047766
             45  |          3   .5345843     5.61   0.000     1.952234    4.047766
             46  |          3   .5345843     5.61   0.000     1.952234    4.047766
             47  |          3   .5345843     5.61   0.000     1.952234    4.047766
             48  |          3   .5345843     5.61   0.000     1.952234    4.047766
             49  |          3   .5345843     5.61   0.000     1.952234    4.047766
             50  |          3   .5345843     5.61   0.000     1.952234    4.047766
             51  |          3   .5345843     5.61   0.000     1.952234    4.047766
             52  |          3   .5345843     5.61   0.000     1.952234    4.047766
                 |
           _cons |   .9134615   .3816255     2.39   0.017     .1654892    1.661434
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
    -----------------------------+------------------------------------------------
    id: Identity                 |
                      var(_cons) |   2.45e-11   5.45e-08             0           .
    -----------------------------+------------------------------------------------
                   var(Residual) |   .2857803   .0396306      .2177669    .3750358
    ------------------------------------------------------------------------------
    LR test vs. linear model: chibar2(01) = 0.00          Prob >= chibar2 = 1.0000
    
    . estat icc
    
    Residual intraclass correlation
    
    ------------------------------------------------------------------------------
                           Level |        ICC   Std. err.     [95% conf. interval]
    -----------------------------+------------------------------------------------
                              id |   8.56e-11          0      8.56e-11    8.56e-11
    ------------------------------------------------------------------------------
    Notice how the ICC simply collapses from a respectable, though not outstanding, 0.67 down to essentially zero when you add the ratee fixed effect in. Note also that the kappa statistic of 0.43 is in the range of respectable but not outstanding. The disconnect between the kappa and the effectively zero ICC of the model with both fixed and random student effects should be a tip-off that something is amiss.

    So I think you need to redo your analyses, omitting the student fixed effects if you want to rely on ICCs.

    That said, while the redone analyses, assuming the results come out with a low ICC, would support your desired change in practice, as a matter of persuasiveness, I think this will fall flat on its face. Your target audience is the residency selection committee, and perhaps the department chair or vice-chair. I have served on such committees myself from time to time in the past. The members usually are mostly clinicians or clinician-educators with little or no statistical background. I think that trying to explain an intra-class correlation coefficient derived from a multi-level ordered logistic regression model to such an audience in the course of, say, a one-hour meeting is pretty much a lost cause. If you have the kind of personality that can prevail through charisma or intimidation with an audience that doesn't understand what you're talking about, then you can go with a complicated approach. But I suspect there will be big enough egos on the committee that you will need to actually convince them.

    So, I would keep it simple. Kappa statistics perhaps. Or, notwithstanding the discrete rating scale and the bunching up of scores at one end of the scale, if I were going to try get fancy, I would use a linear model. At least you can explain that as a somewhat extended version of ordinary linear regression--they are likely to have at least heard of that. And it's not hard, in a linear multi-level model, to explain the ICC as the proportion of variance explained by the student, as opposed to the part attributable to the rater and to chance. (By contrast, it is unlikely that your audience will have ever heard of, let alone comprehend, an ordinal logistic regression. And explaining what that is to them will surely boggle their minds. Worse still is the fact that in a logistic model, the residual variance is fixed at pi2/3. That introduces scaling artifacts into the interpretation, which defeats comparison of models with different variables. And the ICC can no longer be simply explained as a proportion of variance--even though it is, in fact, that.)


    FWIW, I sympathize with your cause. I think that it is very difficult, maybe impossible, to learn much useful about these candidates in interviews. They do serve a minimal purpose of helping weed out applicants who are psychotic or have very severe personality disorders that they cannot disguise during an interview. And since it is almost impossible to flunk out of medical school in the US, you do have to look for those things. But other than those extreme situations, it would surprise me if any interview protocol proved helpful. But that's just my opinion. You've got data. You can analyze it. The hard part will be explaining it to your target audience.

    Good luck!

    Comment

    Working...
    X