100% sensitivity in my estat test, no data classified as negative & I'm only seeing 16/1771 case on my roc print out and scatterplot

Thomas Edbrooke

Join Date: Dec 2020

Posts: 5
#1

100% sensitivity in my estat test, no data classified as negative & I'm only seeing 16/1771 case on my roc print out and scatterplot

06 Dec 2020, 06:13

Good afternoon,

When I run my logistic regression, everything comes out OK with no strange data. However, when I run the estat command, all my data is in the positive category. I'm very new to STATA and our class encourages us to copy paste from the tutor's .do file an instert our own varibales. I've tried researching, but I'm unsure if this is the output I'm supposed to get, or if there's an error in my code, as my Tutor's print out populates all four categories. Im using the following code for the dummy variables:

*Q14 Social Media usage*
gen SMuse =1 if Q14 == 4 | Q14== 5 | Q14 == 2 | Q14 == 3
replace SMuse = 0 if Q14 == 1
label define SMuse 1 "Use" 0 "Don't use"
label values SMuse SMuse

*Q22a Transfer Powers to or from the EU*
gen NationalPower = 1 if Q22a == 2
replace NationalPower = 0 if Q22a == 1 | Q22a == 97
label define NationalPower 1 "Agree" 0 "Disagree"
label values NationalPower NationalPower

*Q23a EU Membership is Good or Bad*
gen MembershipEU = 1 if Q23a == 2
replace MembershipEU = 0 if Q23a == 1 | Q23a == 97
label define MembershipEU 1 "Agree" 0 "Disagree"
label values MembershipEU MembershipEU

*Q28a Immigrants should adopt UK Values*

gen UKValues = 1 if Q28a == 1
replace UKValues = 0 if Q28a == 2 | Q28a == 97
label define UKValues 1 "Agree" 0 "Disagree"
label values UKValues UKValues

*Q30a Immigrants are good for the UK Economy*
gen ImmigrationEco = 1 if Q30a == 2
replace ImmigrationEco = 0 if Q30a == 1 | Q30a == 97
label define ImmigrationEco 1 "Agree" 0 "Disagree"
label values ImmigrationEco ImmigrationEco

logit NationalPower SMuse UKValues MembershipEU ImmigrationEco
logit NationalPower SMuse UKValues MembershipEU ImmigrationEco, or

estat classification
lroc

*** Generate predicted probabilities
predict p

*** Predict standardized residuals
predict stdres, rstand

*** Scatterplot of standardized residuals and predicted probabilities
scatter stdres p, mlabel(NationalPower) yline(0)

end

The roc test shows only 16 cases despite it being recording 1771 as the number of observations. Additionally, my Agree/Dissagree variables overlap on the scatterplot. Below are the print outs:

Additionally when I start running diagnostics on the model, all my cases seem to overlap. If you require more information, I'll send it through ASAP.

Thank you for your time. - Tom

Attached Files

Graph fail.gph (26.9 KB, 1 view)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#2

06 Dec 2020, 11:10

-estat classification-, by default, considers an outcome to be a predicted positive if the predicted probability exceeds 0.5. That's an arbitrary threshold, and as your graph shows, it's completely inappropriate for your data, where Pr(NationalPower) is always greater than 0.5. So you need to override that default by picking an appropriate cutoff to define positive prediction. -estat classification- has a -cutoff()- option that permits you to do that.

Concerning your scatterplot, it seems that for every combination of predictors that yields any given predicted probability of NationalPower, there are both people who Agree and who Disagree on Q22a, so you get the picture you see. That's not really very surprising, since if any combination of predictors always led to disagree or always led to agree as the outcome, then its predicted probability would be equal to, or very close to, 0 or 1. When the predicted probability is not at the extreme ends of the 0-1 interval, you will usually see both Agree and Disagree outcomes at that level of predicted probability (except in very small samples or poorly fitting models).
Comment

Announcement

100% sensitivity in my estat test, no data classified as negative & I'm only seeing 16/1771 case on my roc print out and scatterplot

Comment