Interviewing Candidates for a Job

Ramsey Dallal

Join Date: Nov 2021

Posts: 6
#1

Interviewing Candidates for a Job

22 Jan 2023, 15:15

Non statistician here. So if what I am asking is ridiculous, you have my permission to call me out.

My hypothesis is that interview process for a medical residency position are not helpful at distinguishing candidates- and I want to explain to a group of clinical educators why this is the case. We routinely interview 40 people every year for 4 positions. Each interviewer scores the applicant 1-10 (These are medical students for a residency position, they are chosen by a ranking system. So small changes on how we rank make a large difference to the student applicant. The interview score, nationwide, is usually highly weighted part of the ranking). I want to show that because of the poor reliability of scoring, we should not weigh this heavily. Our data clearly shows that the IRR is low 0.25-0.35. Probably because there are systemic problems in our interview process (unbalanced # of interviews and the entire pool of interviewers do not interview all the candidates. Also, the scores are skewed and there is a lot of clustering at top scores). And the data has heteroskedasticity (probably because low scores had more variability.) Each interviewer also seems to have different variance and some are statistically more lenient than others.

Seeing some of these above problems, in the second year, I made all the interviewers interview everyone and I only picked interviewers who were not overly lenient. I was hoping to show that understanding and correcting for these statistical biases improves reliability (by using ICC).

I wanted to first show that using raw means cannot discriminate between applicants (this seems obvious just looking at the box plots). In AY2022, Coger's kappa was only 0.28. I then used a meologit with the applicant a random effect. I hoped that I can used the coefficient of the fixed effects of the applicant as a corrected measure of interview score (while controlling for the evaluator), instead of a raw mean score.

My questions;
Am I full of crap? Did I make something more complicated than I needed. I am trying to use this as an educational tool to others on the pitfalls of this employment interviews and how to analyze them correctly and correct for biases.
Is this an appropriate use of a mixed model. (I thought of using a metobit model, but i think the meologit makes more sense.) ANOVA will not work, especially in AY22, because we had unequal groups.
Is there a way to list the number of statistically distinct categories of the scoring method? I think that is something easily understandable - "Statistically we only have X distinguishing categories for 40 candidates in Academic year 2022, and Y in AY 23"
Does it make sense to use the fix coefficients of the applicants as a "corrected" score.

Uploaded are the databases for 2022 and 2023 scores.

***AY2022
meologit OVERALLSCORE i.evaluator_id i.candidate_id ||candidate_id: , vce(r) cov(un)
Mixed-effects ologit regression Number of obs = 236
Group variable: candidate_id Number of groups = 40

Obs per group:
min = 4
avg = 5.9
max = 6

Integration method: mvaghermite Integration pts. = 7

Wald chi2(16) = .
Log pseudolikelihood = -406.202 Prob > chi2 = .
(Std. err. adjusted for 40 clusters in candidate_id)
------------------------------------------------------------------------------
| Robust
OVERALLSCORE | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
evaluator_id |
2 | .899008 .7403409 1.21 0.225 -.5520336 2.350049
3 | -.2738794 .7361285 -0.37 0.710 -1.716665 1.168906
4 | 2.548347 .9080481 2.81 0.005 .7686056 4.328089
5 | .1197076 .6667587 0.18 0.858 -1.187115 1.426531
6 | -.575595 .9599034 -0.60 0.549 -2.456971 1.305781
7 | .1540699 .7040065 0.22 0.827 -1.225758 1.533897
8 | 1.581396 .6575229 2.41 0.016 .2926749 2.870117
|
candidate_id |
2 | -.5524405 .1384652 -3.99 0.000 -.8238272 -.2810537
3 | 4.237737 .2971043 14.26 0.000 3.655423 4.820051
4 | 1.959876 .2339169 8.38 0.000 1.501408 2.418345
5 | 4.285109 .2369932 18.08 0.000 3.820611 4.749608
6 | 3.117948 .215282 14.48 0.000 2.696003 3.539893
7 | 5.354264 .3195147 16.76 0.000 4.728027 5.980502
8 | 2.593385 .2106314 12.31 0.000 2.180555 3.006215
9 | 1.982455 .2659333 7.45 0.000 1.461236 2.503675
10 | .6370161 .1097089 5.81 0.000 .4219906 .8520416
11 | .1589699 .1756929 0.90 0.366 -.1853818 .5033217
12 | .4914535 .1159634 4.24 0.000 .2641694 .7187377
13 | 4.838748 .3188922 15.17 0.000 4.213731 5.463765
14 | 3.11466 .2035036 15.31 0.000 2.7158 3.51352
15 | 1.988984 .2266827 8.77 0.000 1.544694 2.433274
16 | 4.849235 .3347594 14.49 0.000 4.193119 5.505352
17 | 1.618268 .1177063 13.75 0.000 1.387568 1.848968
18 | 3.720724 .2758492 13.49 0.000 3.18007 4.261379
19 | 3.289512 .2336927 14.08 0.000 2.831483 3.747542
20 | 1.74793 .1634167 10.70 0.000 1.427639 2.068221
21 | 5.999918 .3439894 17.44 0.000 5.325711 6.674125
22 | 3.409479 .1964246 17.36 0.000 3.024494 3.794464
23 | 4.040119 .2660433 15.19 0.000 3.518684 4.561554
24 | 3.817888 .2936838 13.00 0.000 3.242278 4.393497
25 | 7.448121 .3739559 19.92 0.000 6.715181 8.181061
26 | 2.661355 .2681447 9.93 0.000 2.135801 3.186909
27 | 1.524864 .1806956 8.44 0.000 1.170707 1.879021
28 | 2.023749 .264241 7.66 0.000 1.505846 2.541652
29 | 3.145881 .2778922 11.32 0.000 2.601222 3.69054
30 | 3.103646 .2720242 11.41 0.000 2.570488 3.636804
31 | 4.803786 .345465 13.91 0.000 4.126687 5.480885
32 | 2.382988 .2105832 11.32 0.000 1.970253 2.795724
33 | 3.722913 .2719621 13.69 0.000 3.189877 4.255949
34 | 3.272464 .2617664 12.50 0.000 2.759411 3.785517
35 | 4.04141 .2446491 16.52 0.000 3.561906 4.520913
36 | 2.804329 .3240327 8.65 0.000 2.169236 3.439421
37 | .9424762 .0932577 10.11 0.000 .7596945 1.125258
38 | 3.293601 .2321937 14.18 0.000 2.83851 3.748692
39 | -.7052869 .183582 -3.84 0.000 -1.065101 -.3454729
40 | 3.04541 .2681207 11.36 0.000 2.519903 3.570917
-------------+----------------------------------------------------------------
/cut1 | -1.480647 .7245819 -2.900801 -.0604925
/cut2 | -.8894804 .7173751 -2.29551 .516549
/cut3 | -.3795946 .6970253 -1.745739 .9865499
/cut4 | .0330022 .6857886 -1.311119 1.377123
/cut5 | 1.237011 .683864 -.103338 2.57736
/cut6 | 2.399001 .7103179 1.006803 3.791198
/cut7 | 3.379394 .7090018 1.989776 4.769012
/cut8 | 4.380659 .7463816 2.917778 5.843541
/cut9 | 5.815383 .7222603 4.399779 7.230988
-------------+----------------------------------------------------------------
candidate_id |
var(_cons)| 1.22e-33 5.34e-34 5.18e-34 2.88e-33
------------------------------------------------------------------------------

AY23

meologit Score i.SurgeonID i.StudentID || StudentID: , vce(r) cov(un)
Mixed-effects ologit regression Number of obs = 160
Group variable: StudentID Number of groups = 40

Obs per group:
min = 4
avg = 4.0
max = 4

Integration method: mvaghermite Integration pts. = 7

Wald chi2(13) = .
Log pseudolikelihood = -260.06804 Prob > chi2 = .
(Std. err. adjusted for 40 clusters in StudentID)
------------------------------------------------------------------------------
| Robust
Score | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
SurgeonID |
2 | -.2331704 .4329376 -0.54 0.590 -1.081712 .6153718
3 | .2070364 .4610952 0.45 0.653 -.6966936 1.110766
4 | -.4671865 .5548703 -0.84 0.400 -1.554712 .6203393
|
StudentID |
2 | 3.091381 .3273527 9.44 0.000 2.449781 3.73298
3 | 2.282311 .2726246 8.37 0.000 1.747976 2.816645
4 | 2.516177 .3116528 8.07 0.000 1.905348 3.127005
5 | 2.827693 .3013976 9.38 0.000 2.236964 3.418421
6 | 3.832582 .4144127 9.25 0.000 3.020348 4.644816
7 | 2.369511 .3075211 7.71 0.000 1.766781 2.972242
8 | 2.50721 .2681125 9.35 0.000 1.981719 3.032701
9 | 6.311666 .6035483 10.46 0.000 5.128733 7.494599
10 | 1.607775 .2001366 8.03 0.000 1.215515 2.000036
11 | .9208777 .1234015 7.46 0.000 .6790152 1.16274
12 | 3.770999 .4267267 8.84 0.000 2.93463 4.607368
13 | 1.329507 .2684374 4.95 0.000 .8033795 1.855635
14 | -2.8562 .3572448 -8.00 0.000 -3.556387 -2.156013
15 | 2.156988 .21854 9.87 0.000 1.728658 2.585319
16 | 1.148029 .1482822 7.74 0.000 .8574014 1.438657
17 | 2.115842 .2533469 8.35 0.000 1.619292 2.612393
18 | 1.761543 .2169952 8.12 0.000 1.33624 2.186846
19 | 4.568019 .4781002 9.55 0.000 3.630959 5.505078
20 | 2.116829 .2591079 8.17 0.000 1.608987 2.624671
21 | .3541152 .1232558 2.87 0.004 .1125383 .595692
22 | 2.7769 .3293046 8.43 0.000 2.131475 3.422325
23 | 1.76258 .2105154 8.37 0.000 1.349977 2.175182
24 | .1269294 .1087379 1.17 0.243 -.086193 .3400518
25 | .8874248 .1327275 6.69 0.000 .6272837 1.147566
26 | 5.464097 .553735 9.87 0.000 4.378796 6.549397
27 | 1.872128 .2263589 8.27 0.000 1.428472 2.315783
28 | 1.898707 .2178189 8.72 0.000 1.471789 2.325624
29 | 2.13649 .2670036 8.00 0.000 1.613173 2.659807
30 | -.5222354 .1636605 -3.19 0.001 -.8430041 -.2014667
31 | -1.399335 .2407365 -5.81 0.000 -1.87117 -.9274998
32 | -.7674789 .1830605 -4.19 0.000 -1.126271 -.4086869
33 | 4.413593 .4777482 9.24 0.000 3.477224 5.349962
34 | -.4893793 .1747552 -2.80 0.005 -.8318932 -.1468655
35 | 2.843556 .3412455 8.33 0.000 2.174727 3.512385
36 | 1.219798 .1714068 7.12 0.000 .8838467 1.555749
37 | 2.772394 .3237463 8.56 0.000 2.137863 3.406925
38 | 4.555018 .4708126 9.67 0.000 3.632242 5.477793
39 | 5.421951 .5180434 10.47 0.000 4.406604 6.437297
40 | 3.525434 .3880814 9.08 0.000 2.764808 4.286059
-------------+----------------------------------------------------------------
/cut1 | -4.932303 .8493545 -6.597007 -3.267599
/cut2 | -3.632993 .8563627 -5.311433 -1.954553
/cut3 | -2.418725 .6167812 -3.627594 -1.209856
/cut4 | -1.114382 .3078888 -1.717834 -.5109315
/cut5 | .2712292 .2439076 -.206821 .7492794
/cut6 | 1.265936 .2727214 .7314117 1.80046
/cut7 | 2.605524 .3886421 1.8438 3.367249
/cut8 | 4.184127 .5424499 3.120945 5.24731
/cut9 | 6.393414 .6457801 5.127708 7.65912
-------------+----------------------------------------------------------------
StudentID |
var(_cons)| 3.22e-34 2.22e-34 8.36e-35 1.24e-33
------------------------------------------------------------------------------

Attached Files

ay22 Stata brief.dta (8.5 KB, 1 view)

AY23 Stata brief.dta (4.6 KB, 1 view)
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30084

22 Jan 2023, 17:47

Your analysis is incorrect. By including the fixed effect of the student, you have removed from the random effects part of the model most of the variance that the student random effects might explain. So it is no surprise that your student-level variance components are, for practical purposes zero. With this analysis, even a very good rating system would appear useless.

Here's a simpler illustration of how this goes wrong:

Code:

. webuse rate2no3, clear

.
. kap ratera raterb

             Expected
Agreement   agreement     Kappa   Std. err.         Z      Prob>Z
-----------------------------------------------------------------
  67.31%      42.16%     0.4348     0.1010       4.31      0.0000

.
. rename rater* rating#, addnumber

. reshape long rating, i(id) j(rater)
(j = 1 2)

Data                               Wide   ->   Long
-----------------------------------------------------------------------------
Number of observations               52   ->   104         
Number of variables                   3   ->   3           
j variable (2 values)                     ->   rater
xij variables:
                        rating1 rating2   ->   rating
-----------------------------------------------------------------------------

.
. mixed rating i.rater || id:

Performing EM optimization ...

Performing gradient-based optimization:
Iteration 0:   log likelihood = -160.17176  
Iteration 1:   log likelihood = -160.17176  

Computing standard errors ...

Mixed-effects ML regression                     Number of obs     =        104
Group variable: id                              Number of groups  =         52
                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2
                                                Wald chi2(1)      =       1.36
Log likelihood = -160.17176                     Prob > chi2       =     0.2431

------------------------------------------------------------------------------
      rating | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     2.rater |   .1730769    .148267     1.17   0.243     -.117521    .4636749
       _cons |   2.826923   .1811383    15.61   0.000     2.471899    3.181948
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   1.134616   .2841448      .6945126    1.853606
-----------------------------+------------------------------------------------
               var(Residual) |   .5715605   .1120922      .3891597    .8394533
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 30.36         Prob >= chibar2 = 0.0000

. estat icc

Residual intraclass correlation

------------------------------------------------------------------------------
                       Level |        ICC   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
                          id |    .665005   .0773486      .5012913    .7967644
------------------------------------------------------------------------------

.
. mixed rating i.rater i.id || id:

Performing EM optimization ...

Performing gradient-based optimization:
Iteration 0:   log likelihood = -83.697849  
Iteration 1:   log likelihood = -82.453952  
Iteration 2:   log likelihood = -82.437951  
Iteration 3:   log likelihood = -82.437951  

Computing standard errors ...

Mixed-effects ML regression                     Number of obs     =        104
Group variable: id                              Number of groups  =         52
                                                Obs per group:
                                                              min =          2
                                                              avg =        2.0
                                                              max =          2
                                                Wald chi2(52)     =     519.63
Log likelihood = -82.437951                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      rating | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     2.rater |   .1730769   .1048406     1.65   0.099    -.0324069    .3785607
             |
          id |
          2  |  -2.25e-15   .5345843    -0.00   1.000    -1.047766    1.047766
          3  |          0   .5345843     0.00   1.000    -1.047766    1.047766
          4  |   1.59e-15   .5345843     0.00   1.000    -1.047766    1.047766
          5  |   3.17e-15   .5345843     0.00   1.000    -1.047766    1.047766
          6  |   1.27e-15   .5345843     0.00   1.000    -1.047766    1.047766
          7  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
          8  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
          9  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
         10  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
         11  |        1.5   .5345843     2.81   0.005     .4522341    2.547766
         12  |        1.5   .5345843     2.81   0.005     .4522341    2.547766
         13  |        1.5   .5345843     2.81   0.005     .4522341    2.547766
         14  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
         15  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
         16  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
         17  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
         18  |         .5   .5345843     0.94   0.350    -.5477659    1.547766
         19  |          1   .5345843     1.87   0.061    -.0477659    2.047766
         20  |          1   .5345843     1.87   0.061    -.0477659    2.047766
         21  |          1   .5345843     1.87   0.061    -.0477659    2.047766
         22  |          2   .5345843     3.74   0.000     .9522341    3.047766
         23  |          2   .5345843     3.74   0.000     .9522341    3.047766
         24  |          2   .5345843     3.74   0.000     .9522341    3.047766
         25  |        1.5   .5345843     2.81   0.005     .4522341    2.547766
         26  |          2   .5345843     3.74   0.000     .9522341    3.047766
         27  |          3   .5345843     5.61   0.000     1.952234    4.047766
         28  |          3   .5345843     5.61   0.000     1.952234    4.047766
         29  |          3   .5345843     5.61   0.000     1.952234    4.047766
         30  |          3   .5345843     5.61   0.000     1.952234    4.047766
         31  |          3   .5345843     5.61   0.000     1.952234    4.047766
         32  |          3   .5345843     5.61   0.000     1.952234    4.047766
         33  |          3   .5345843     5.61   0.000     1.952234    4.047766
         34  |          3   .5345843     5.61   0.000     1.952234    4.047766
         35  |          3   .5345843     5.61   0.000     1.952234    4.047766
         36  |          3   .5345843     5.61   0.000     1.952234    4.047766
         37  |          3   .5345843     5.61   0.000     1.952234    4.047766
         38  |          3   .5345843     5.61   0.000     1.952234    4.047766
         39  |          3   .5345843     5.61   0.000     1.952234    4.047766
         40  |          3   .5345843     5.61   0.000     1.952234    4.047766
         41  |          3   .5345843     5.61   0.000     1.952234    4.047766
         42  |          3   .5345843     5.61   0.000     1.952234    4.047766
         43  |          3   .5345843     5.61   0.000     1.952234    4.047766
         44  |          3   .5345843     5.61   0.000     1.952234    4.047766
         45  |          3   .5345843     5.61   0.000     1.952234    4.047766
         46  |          3   .5345843     5.61   0.000     1.952234    4.047766
         47  |          3   .5345843     5.61   0.000     1.952234    4.047766
         48  |          3   .5345843     5.61   0.000     1.952234    4.047766
         49  |          3   .5345843     5.61   0.000     1.952234    4.047766
         50  |          3   .5345843     5.61   0.000     1.952234    4.047766
         51  |          3   .5345843     5.61   0.000     1.952234    4.047766
         52  |          3   .5345843     5.61   0.000     1.952234    4.047766
             |
       _cons |   .9134615   .3816255     2.39   0.017     .1654892    1.661434
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   2.45e-11   5.45e-08             0           .
-----------------------------+------------------------------------------------
               var(Residual) |   .2857803   .0396306      .2177669    .3750358
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 0.00          Prob >= chibar2 = 1.0000

. estat icc

Residual intraclass correlation

------------------------------------------------------------------------------
                       Level |        ICC   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
                          id |   8.56e-11          0      8.56e-11    8.56e-11
------------------------------------------------------------------------------

Notice how the ICC simply collapses from a respectable, though not outstanding, 0.67 down to essentially zero when you add the ratee fixed effect in. Note also that the kappa statistic of 0.43 is in the range of respectable but not outstanding. The disconnect between the kappa and the effectively zero ICC of the model with both fixed and random student effects should be a tip-off that something is amiss.

So I think you need to redo your analyses, omitting the student fixed effects if you want to rely on ICCs.

That said, while the redone analyses, assuming the results come out with a low ICC, would support your desired change in practice, as a matter of persuasiveness, I think this will fall flat on its face. Your target audience is the residency selection committee, and perhaps the department chair or vice-chair. I have served on such committees myself from time to time in the past. The members usually are mostly clinicians or clinician-educators with little or no statistical background. I think that trying to explain an intra-class correlation coefficient derived from a multi-level ordered logistic regression model to such an audience in the course of, say, a one-hour meeting is pretty much a lost cause. If you have the kind of personality that can prevail through charisma or intimidation with an audience that doesn't understand what you're talking about, then you can go with a complicated approach. But I suspect there will be big enough egos on the committee that you will need to actually convince them.

So, I would keep it simple. Kappa statistics perhaps. Or, notwithstanding the discrete rating scale and the bunching up of scores at one end of the scale, if I were going to try get fancy, I would use a linear model. At least you can explain that as a somewhat extended version of ordinary linear regression--they are likely to have at least heard of that. And it's not hard, in a linear multi-level model, to explain the ICC as the proportion of variance explained by the student, as opposed to the part attributable to the rater and to chance. (By contrast, it is unlikely that your audience will have ever heard of, let alone comprehend, an ordinal logistic regression. And explaining what that is to them will surely boggle their minds. Worse still is the fact that in a logistic model, the residual variance is fixed at pi²/3. That introduces scaling artifacts into the interpretation, which defeats comparison of models with different variables. And the ICC can no longer be simply explained as a proportion of variance--even though it is, in fact, that.)

FWIW, I sympathize with your cause. I think that it is very difficult, maybe impossible, to learn much useful about these candidates in interviews. They do serve a minimal purpose of helping weed out applicants who are psychotic or have very severe personality disorders that they cannot disguise during an interview. And since it is almost impossible to flunk out of medical school in the US, you do have to look for those things. But other than those extreme situations, it would surprise me if any interview protocol proved helpful. But that's just my opinion. You've got data. You can analyze it. The hard part will be explaining it to your target audience.

Good luck!

Announcement

Interviewing Candidates for a Job

Comment