Non statistician here. So if what I am asking is ridiculous, you have my permission to call me out.
My hypothesis is that interview process for a medical residency position are not helpful at distinguishing candidates- and I want to explain to a group of clinical educators why this is the case. We routinely interview 40 people every year for 4 positions. Each interviewer scores the applicant 1-10 (These are medical students for a residency position, they are chosen by a ranking system. So small changes on how we rank make a large difference to the student applicant. The interview score, nationwide, is usually highly weighted part of the ranking). I want to show that because of the poor reliability of scoring, we should not weigh this heavily. Our data clearly shows that the IRR is low 0.25-0.35. Probably because there are systemic problems in our interview process (unbalanced # of interviews and the entire pool of interviewers do not interview all the candidates. Also, the scores are skewed and there is a lot of clustering at top scores). And the data has heteroskedasticity (probably because low scores had more variability.) Each interviewer also seems to have different variance and some are statistically more lenient than others.
Seeing some of these above problems, in the second year, I made all the interviewers interview everyone and I only picked interviewers who were not overly lenient. I was hoping to show that understanding and correcting for these statistical biases improves reliability (by using ICC).
I wanted to first show that using raw means cannot discriminate between applicants (this seems obvious just looking at the box plots). In AY2022, Coger's kappa was only 0.28. I then used a meologit with the applicant a random effect. I hoped that I can used the coefficient of the fixed effects of the applicant as a corrected measure of interview score (while controlling for the evaluator), instead of a raw mean score.
My questions;
Am I full of crap? Did I make something more complicated than I needed. I am trying to use this as an educational tool to others on the pitfalls of this employment interviews and how to analyze them correctly and correct for biases.
Is this an appropriate use of a mixed model. (I thought of using a metobit model, but i think the meologit makes more sense.) ANOVA will not work, especially in AY22, because we had unequal groups.
Is there a way to list the number of statistically distinct categories of the scoring method? I think that is something easily understandable - "Statistically we only have X distinguishing categories for 40 candidates in Academic year 2022, and Y in AY 23"
Does it make sense to use the fix coefficients of the applicants as a "corrected" score.
Uploaded are the databases for 2022 and 2023 scores.
***AY2022
meologit OVERALLSCORE i.evaluator_id i.candidate_id ||candidate_id: , vce(r) cov(un)
Mixed-effects ologit regression Number of obs = 236
Group variable: candidate_id Number of groups = 40
Obs per group:
min = 4
avg = 5.9
max = 6
Integration method: mvaghermite Integration pts. = 7
Wald chi2(16) = .
Log pseudolikelihood = -406.202 Prob > chi2 = .
(Std. err. adjusted for 40 clusters in candidate_id)
------------------------------------------------------------------------------
| Robust
OVERALLSCORE | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
evaluator_id |
2 | .899008 .7403409 1.21 0.225 -.5520336 2.350049
3 | -.2738794 .7361285 -0.37 0.710 -1.716665 1.168906
4 | 2.548347 .9080481 2.81 0.005 .7686056 4.328089
5 | .1197076 .6667587 0.18 0.858 -1.187115 1.426531
6 | -.575595 .9599034 -0.60 0.549 -2.456971 1.305781
7 | .1540699 .7040065 0.22 0.827 -1.225758 1.533897
8 | 1.581396 .6575229 2.41 0.016 .2926749 2.870117
|
candidate_id |
2 | -.5524405 .1384652 -3.99 0.000 -.8238272 -.2810537
3 | 4.237737 .2971043 14.26 0.000 3.655423 4.820051
4 | 1.959876 .2339169 8.38 0.000 1.501408 2.418345
5 | 4.285109 .2369932 18.08 0.000 3.820611 4.749608
6 | 3.117948 .215282 14.48 0.000 2.696003 3.539893
7 | 5.354264 .3195147 16.76 0.000 4.728027 5.980502
8 | 2.593385 .2106314 12.31 0.000 2.180555 3.006215
9 | 1.982455 .2659333 7.45 0.000 1.461236 2.503675
10 | .6370161 .1097089 5.81 0.000 .4219906 .8520416
11 | .1589699 .1756929 0.90 0.366 -.1853818 .5033217
12 | .4914535 .1159634 4.24 0.000 .2641694 .7187377
13 | 4.838748 .3188922 15.17 0.000 4.213731 5.463765
14 | 3.11466 .2035036 15.31 0.000 2.7158 3.51352
15 | 1.988984 .2266827 8.77 0.000 1.544694 2.433274
16 | 4.849235 .3347594 14.49 0.000 4.193119 5.505352
17 | 1.618268 .1177063 13.75 0.000 1.387568 1.848968
18 | 3.720724 .2758492 13.49 0.000 3.18007 4.261379
19 | 3.289512 .2336927 14.08 0.000 2.831483 3.747542
20 | 1.74793 .1634167 10.70 0.000 1.427639 2.068221
21 | 5.999918 .3439894 17.44 0.000 5.325711 6.674125
22 | 3.409479 .1964246 17.36 0.000 3.024494 3.794464
23 | 4.040119 .2660433 15.19 0.000 3.518684 4.561554
24 | 3.817888 .2936838 13.00 0.000 3.242278 4.393497
25 | 7.448121 .3739559 19.92 0.000 6.715181 8.181061
26 | 2.661355 .2681447 9.93 0.000 2.135801 3.186909
27 | 1.524864 .1806956 8.44 0.000 1.170707 1.879021
28 | 2.023749 .264241 7.66 0.000 1.505846 2.541652
29 | 3.145881 .2778922 11.32 0.000 2.601222 3.69054
30 | 3.103646 .2720242 11.41 0.000 2.570488 3.636804
31 | 4.803786 .345465 13.91 0.000 4.126687 5.480885
32 | 2.382988 .2105832 11.32 0.000 1.970253 2.795724
33 | 3.722913 .2719621 13.69 0.000 3.189877 4.255949
34 | 3.272464 .2617664 12.50 0.000 2.759411 3.785517
35 | 4.04141 .2446491 16.52 0.000 3.561906 4.520913
36 | 2.804329 .3240327 8.65 0.000 2.169236 3.439421
37 | .9424762 .0932577 10.11 0.000 .7596945 1.125258
38 | 3.293601 .2321937 14.18 0.000 2.83851 3.748692
39 | -.7052869 .183582 -3.84 0.000 -1.065101 -.3454729
40 | 3.04541 .2681207 11.36 0.000 2.519903 3.570917
-------------+----------------------------------------------------------------
/cut1 | -1.480647 .7245819 -2.900801 -.0604925
/cut2 | -.8894804 .7173751 -2.29551 .516549
/cut3 | -.3795946 .6970253 -1.745739 .9865499
/cut4 | .0330022 .6857886 -1.311119 1.377123
/cut5 | 1.237011 .683864 -.103338 2.57736
/cut6 | 2.399001 .7103179 1.006803 3.791198
/cut7 | 3.379394 .7090018 1.989776 4.769012
/cut8 | 4.380659 .7463816 2.917778 5.843541
/cut9 | 5.815383 .7222603 4.399779 7.230988
-------------+----------------------------------------------------------------
candidate_id |
var(_cons)| 1.22e-33 5.34e-34 5.18e-34 2.88e-33
------------------------------------------------------------------------------
AY23
meologit Score i.SurgeonID i.StudentID || StudentID: , vce(r) cov(un)
Mixed-effects ologit regression Number of obs = 160
Group variable: StudentID Number of groups = 40
Obs per group:
min = 4
avg = 4.0
max = 4
Integration method: mvaghermite Integration pts. = 7
Wald chi2(13) = .
Log pseudolikelihood = -260.06804 Prob > chi2 = .
(Std. err. adjusted for 40 clusters in StudentID)
------------------------------------------------------------------------------
| Robust
Score | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
SurgeonID |
2 | -.2331704 .4329376 -0.54 0.590 -1.081712 .6153718
3 | .2070364 .4610952 0.45 0.653 -.6966936 1.110766
4 | -.4671865 .5548703 -0.84 0.400 -1.554712 .6203393
|
StudentID |
2 | 3.091381 .3273527 9.44 0.000 2.449781 3.73298
3 | 2.282311 .2726246 8.37 0.000 1.747976 2.816645
4 | 2.516177 .3116528 8.07 0.000 1.905348 3.127005
5 | 2.827693 .3013976 9.38 0.000 2.236964 3.418421
6 | 3.832582 .4144127 9.25 0.000 3.020348 4.644816
7 | 2.369511 .3075211 7.71 0.000 1.766781 2.972242
8 | 2.50721 .2681125 9.35 0.000 1.981719 3.032701
9 | 6.311666 .6035483 10.46 0.000 5.128733 7.494599
10 | 1.607775 .2001366 8.03 0.000 1.215515 2.000036
11 | .9208777 .1234015 7.46 0.000 .6790152 1.16274
12 | 3.770999 .4267267 8.84 0.000 2.93463 4.607368
13 | 1.329507 .2684374 4.95 0.000 .8033795 1.855635
14 | -2.8562 .3572448 -8.00 0.000 -3.556387 -2.156013
15 | 2.156988 .21854 9.87 0.000 1.728658 2.585319
16 | 1.148029 .1482822 7.74 0.000 .8574014 1.438657
17 | 2.115842 .2533469 8.35 0.000 1.619292 2.612393
18 | 1.761543 .2169952 8.12 0.000 1.33624 2.186846
19 | 4.568019 .4781002 9.55 0.000 3.630959 5.505078
20 | 2.116829 .2591079 8.17 0.000 1.608987 2.624671
21 | .3541152 .1232558 2.87 0.004 .1125383 .595692
22 | 2.7769 .3293046 8.43 0.000 2.131475 3.422325
23 | 1.76258 .2105154 8.37 0.000 1.349977 2.175182
24 | .1269294 .1087379 1.17 0.243 -.086193 .3400518
25 | .8874248 .1327275 6.69 0.000 .6272837 1.147566
26 | 5.464097 .553735 9.87 0.000 4.378796 6.549397
27 | 1.872128 .2263589 8.27 0.000 1.428472 2.315783
28 | 1.898707 .2178189 8.72 0.000 1.471789 2.325624
29 | 2.13649 .2670036 8.00 0.000 1.613173 2.659807
30 | -.5222354 .1636605 -3.19 0.001 -.8430041 -.2014667
31 | -1.399335 .2407365 -5.81 0.000 -1.87117 -.9274998
32 | -.7674789 .1830605 -4.19 0.000 -1.126271 -.4086869
33 | 4.413593 .4777482 9.24 0.000 3.477224 5.349962
34 | -.4893793 .1747552 -2.80 0.005 -.8318932 -.1468655
35 | 2.843556 .3412455 8.33 0.000 2.174727 3.512385
36 | 1.219798 .1714068 7.12 0.000 .8838467 1.555749
37 | 2.772394 .3237463 8.56 0.000 2.137863 3.406925
38 | 4.555018 .4708126 9.67 0.000 3.632242 5.477793
39 | 5.421951 .5180434 10.47 0.000 4.406604 6.437297
40 | 3.525434 .3880814 9.08 0.000 2.764808 4.286059
-------------+----------------------------------------------------------------
/cut1 | -4.932303 .8493545 -6.597007 -3.267599
/cut2 | -3.632993 .8563627 -5.311433 -1.954553
/cut3 | -2.418725 .6167812 -3.627594 -1.209856
/cut4 | -1.114382 .3078888 -1.717834 -.5109315
/cut5 | .2712292 .2439076 -.206821 .7492794
/cut6 | 1.265936 .2727214 .7314117 1.80046
/cut7 | 2.605524 .3886421 1.8438 3.367249
/cut8 | 4.184127 .5424499 3.120945 5.24731
/cut9 | 6.393414 .6457801 5.127708 7.65912
-------------+----------------------------------------------------------------
StudentID |
var(_cons)| 3.22e-34 2.22e-34 8.36e-35 1.24e-33
------------------------------------------------------------------------------
My hypothesis is that interview process for a medical residency position are not helpful at distinguishing candidates- and I want to explain to a group of clinical educators why this is the case. We routinely interview 40 people every year for 4 positions. Each interviewer scores the applicant 1-10 (These are medical students for a residency position, they are chosen by a ranking system. So small changes on how we rank make a large difference to the student applicant. The interview score, nationwide, is usually highly weighted part of the ranking). I want to show that because of the poor reliability of scoring, we should not weigh this heavily. Our data clearly shows that the IRR is low 0.25-0.35. Probably because there are systemic problems in our interview process (unbalanced # of interviews and the entire pool of interviewers do not interview all the candidates. Also, the scores are skewed and there is a lot of clustering at top scores). And the data has heteroskedasticity (probably because low scores had more variability.) Each interviewer also seems to have different variance and some are statistically more lenient than others.
Seeing some of these above problems, in the second year, I made all the interviewers interview everyone and I only picked interviewers who were not overly lenient. I was hoping to show that understanding and correcting for these statistical biases improves reliability (by using ICC).
I wanted to first show that using raw means cannot discriminate between applicants (this seems obvious just looking at the box plots). In AY2022, Coger's kappa was only 0.28. I then used a meologit with the applicant a random effect. I hoped that I can used the coefficient of the fixed effects of the applicant as a corrected measure of interview score (while controlling for the evaluator), instead of a raw mean score.
My questions;
Am I full of crap? Did I make something more complicated than I needed. I am trying to use this as an educational tool to others on the pitfalls of this employment interviews and how to analyze them correctly and correct for biases.
Is this an appropriate use of a mixed model. (I thought of using a metobit model, but i think the meologit makes more sense.) ANOVA will not work, especially in AY22, because we had unequal groups.
Is there a way to list the number of statistically distinct categories of the scoring method? I think that is something easily understandable - "Statistically we only have X distinguishing categories for 40 candidates in Academic year 2022, and Y in AY 23"
Does it make sense to use the fix coefficients of the applicants as a "corrected" score.
Uploaded are the databases for 2022 and 2023 scores.
***AY2022
meologit OVERALLSCORE i.evaluator_id i.candidate_id ||candidate_id: , vce(r) cov(un)
Mixed-effects ologit regression Number of obs = 236
Group variable: candidate_id Number of groups = 40
Obs per group:
min = 4
avg = 5.9
max = 6
Integration method: mvaghermite Integration pts. = 7
Wald chi2(16) = .
Log pseudolikelihood = -406.202 Prob > chi2 = .
(Std. err. adjusted for 40 clusters in candidate_id)
------------------------------------------------------------------------------
| Robust
OVERALLSCORE | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
evaluator_id |
2 | .899008 .7403409 1.21 0.225 -.5520336 2.350049
3 | -.2738794 .7361285 -0.37 0.710 -1.716665 1.168906
4 | 2.548347 .9080481 2.81 0.005 .7686056 4.328089
5 | .1197076 .6667587 0.18 0.858 -1.187115 1.426531
6 | -.575595 .9599034 -0.60 0.549 -2.456971 1.305781
7 | .1540699 .7040065 0.22 0.827 -1.225758 1.533897
8 | 1.581396 .6575229 2.41 0.016 .2926749 2.870117
|
candidate_id |
2 | -.5524405 .1384652 -3.99 0.000 -.8238272 -.2810537
3 | 4.237737 .2971043 14.26 0.000 3.655423 4.820051
4 | 1.959876 .2339169 8.38 0.000 1.501408 2.418345
5 | 4.285109 .2369932 18.08 0.000 3.820611 4.749608
6 | 3.117948 .215282 14.48 0.000 2.696003 3.539893
7 | 5.354264 .3195147 16.76 0.000 4.728027 5.980502
8 | 2.593385 .2106314 12.31 0.000 2.180555 3.006215
9 | 1.982455 .2659333 7.45 0.000 1.461236 2.503675
10 | .6370161 .1097089 5.81 0.000 .4219906 .8520416
11 | .1589699 .1756929 0.90 0.366 -.1853818 .5033217
12 | .4914535 .1159634 4.24 0.000 .2641694 .7187377
13 | 4.838748 .3188922 15.17 0.000 4.213731 5.463765
14 | 3.11466 .2035036 15.31 0.000 2.7158 3.51352
15 | 1.988984 .2266827 8.77 0.000 1.544694 2.433274
16 | 4.849235 .3347594 14.49 0.000 4.193119 5.505352
17 | 1.618268 .1177063 13.75 0.000 1.387568 1.848968
18 | 3.720724 .2758492 13.49 0.000 3.18007 4.261379
19 | 3.289512 .2336927 14.08 0.000 2.831483 3.747542
20 | 1.74793 .1634167 10.70 0.000 1.427639 2.068221
21 | 5.999918 .3439894 17.44 0.000 5.325711 6.674125
22 | 3.409479 .1964246 17.36 0.000 3.024494 3.794464
23 | 4.040119 .2660433 15.19 0.000 3.518684 4.561554
24 | 3.817888 .2936838 13.00 0.000 3.242278 4.393497
25 | 7.448121 .3739559 19.92 0.000 6.715181 8.181061
26 | 2.661355 .2681447 9.93 0.000 2.135801 3.186909
27 | 1.524864 .1806956 8.44 0.000 1.170707 1.879021
28 | 2.023749 .264241 7.66 0.000 1.505846 2.541652
29 | 3.145881 .2778922 11.32 0.000 2.601222 3.69054
30 | 3.103646 .2720242 11.41 0.000 2.570488 3.636804
31 | 4.803786 .345465 13.91 0.000 4.126687 5.480885
32 | 2.382988 .2105832 11.32 0.000 1.970253 2.795724
33 | 3.722913 .2719621 13.69 0.000 3.189877 4.255949
34 | 3.272464 .2617664 12.50 0.000 2.759411 3.785517
35 | 4.04141 .2446491 16.52 0.000 3.561906 4.520913
36 | 2.804329 .3240327 8.65 0.000 2.169236 3.439421
37 | .9424762 .0932577 10.11 0.000 .7596945 1.125258
38 | 3.293601 .2321937 14.18 0.000 2.83851 3.748692
39 | -.7052869 .183582 -3.84 0.000 -1.065101 -.3454729
40 | 3.04541 .2681207 11.36 0.000 2.519903 3.570917
-------------+----------------------------------------------------------------
/cut1 | -1.480647 .7245819 -2.900801 -.0604925
/cut2 | -.8894804 .7173751 -2.29551 .516549
/cut3 | -.3795946 .6970253 -1.745739 .9865499
/cut4 | .0330022 .6857886 -1.311119 1.377123
/cut5 | 1.237011 .683864 -.103338 2.57736
/cut6 | 2.399001 .7103179 1.006803 3.791198
/cut7 | 3.379394 .7090018 1.989776 4.769012
/cut8 | 4.380659 .7463816 2.917778 5.843541
/cut9 | 5.815383 .7222603 4.399779 7.230988
-------------+----------------------------------------------------------------
candidate_id |
var(_cons)| 1.22e-33 5.34e-34 5.18e-34 2.88e-33
------------------------------------------------------------------------------
AY23
meologit Score i.SurgeonID i.StudentID || StudentID: , vce(r) cov(un)
Mixed-effects ologit regression Number of obs = 160
Group variable: StudentID Number of groups = 40
Obs per group:
min = 4
avg = 4.0
max = 4
Integration method: mvaghermite Integration pts. = 7
Wald chi2(13) = .
Log pseudolikelihood = -260.06804 Prob > chi2 = .
(Std. err. adjusted for 40 clusters in StudentID)
------------------------------------------------------------------------------
| Robust
Score | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
SurgeonID |
2 | -.2331704 .4329376 -0.54 0.590 -1.081712 .6153718
3 | .2070364 .4610952 0.45 0.653 -.6966936 1.110766
4 | -.4671865 .5548703 -0.84 0.400 -1.554712 .6203393
|
StudentID |
2 | 3.091381 .3273527 9.44 0.000 2.449781 3.73298
3 | 2.282311 .2726246 8.37 0.000 1.747976 2.816645
4 | 2.516177 .3116528 8.07 0.000 1.905348 3.127005
5 | 2.827693 .3013976 9.38 0.000 2.236964 3.418421
6 | 3.832582 .4144127 9.25 0.000 3.020348 4.644816
7 | 2.369511 .3075211 7.71 0.000 1.766781 2.972242
8 | 2.50721 .2681125 9.35 0.000 1.981719 3.032701
9 | 6.311666 .6035483 10.46 0.000 5.128733 7.494599
10 | 1.607775 .2001366 8.03 0.000 1.215515 2.000036
11 | .9208777 .1234015 7.46 0.000 .6790152 1.16274
12 | 3.770999 .4267267 8.84 0.000 2.93463 4.607368
13 | 1.329507 .2684374 4.95 0.000 .8033795 1.855635
14 | -2.8562 .3572448 -8.00 0.000 -3.556387 -2.156013
15 | 2.156988 .21854 9.87 0.000 1.728658 2.585319
16 | 1.148029 .1482822 7.74 0.000 .8574014 1.438657
17 | 2.115842 .2533469 8.35 0.000 1.619292 2.612393
18 | 1.761543 .2169952 8.12 0.000 1.33624 2.186846
19 | 4.568019 .4781002 9.55 0.000 3.630959 5.505078
20 | 2.116829 .2591079 8.17 0.000 1.608987 2.624671
21 | .3541152 .1232558 2.87 0.004 .1125383 .595692
22 | 2.7769 .3293046 8.43 0.000 2.131475 3.422325
23 | 1.76258 .2105154 8.37 0.000 1.349977 2.175182
24 | .1269294 .1087379 1.17 0.243 -.086193 .3400518
25 | .8874248 .1327275 6.69 0.000 .6272837 1.147566
26 | 5.464097 .553735 9.87 0.000 4.378796 6.549397
27 | 1.872128 .2263589 8.27 0.000 1.428472 2.315783
28 | 1.898707 .2178189 8.72 0.000 1.471789 2.325624
29 | 2.13649 .2670036 8.00 0.000 1.613173 2.659807
30 | -.5222354 .1636605 -3.19 0.001 -.8430041 -.2014667
31 | -1.399335 .2407365 -5.81 0.000 -1.87117 -.9274998
32 | -.7674789 .1830605 -4.19 0.000 -1.126271 -.4086869
33 | 4.413593 .4777482 9.24 0.000 3.477224 5.349962
34 | -.4893793 .1747552 -2.80 0.005 -.8318932 -.1468655
35 | 2.843556 .3412455 8.33 0.000 2.174727 3.512385
36 | 1.219798 .1714068 7.12 0.000 .8838467 1.555749
37 | 2.772394 .3237463 8.56 0.000 2.137863 3.406925
38 | 4.555018 .4708126 9.67 0.000 3.632242 5.477793
39 | 5.421951 .5180434 10.47 0.000 4.406604 6.437297
40 | 3.525434 .3880814 9.08 0.000 2.764808 4.286059
-------------+----------------------------------------------------------------
/cut1 | -4.932303 .8493545 -6.597007 -3.267599
/cut2 | -3.632993 .8563627 -5.311433 -1.954553
/cut3 | -2.418725 .6167812 -3.627594 -1.209856
/cut4 | -1.114382 .3078888 -1.717834 -.5109315
/cut5 | .2712292 .2439076 -.206821 .7492794
/cut6 | 1.265936 .2727214 .7314117 1.80046
/cut7 | 2.605524 .3886421 1.8438 3.367249
/cut8 | 4.184127 .5424499 3.120945 5.24731
/cut9 | 6.393414 .6457801 5.127708 7.65912
-------------+----------------------------------------------------------------
StudentID |
var(_cons)| 3.22e-34 2.22e-34 8.36e-35 1.24e-33
------------------------------------------------------------------------------
Comment