split-sample internal validation logistic regression

Murph Ngo

Join Date: Apr 2017

Posts: 11
#1

split-sample internal validation logistic regression

14 Apr 2017, 23:01

Hello all,

I am attempting to perform split-sample internal validation on a logistic regression model I have created (logistic command). I understand there are limitations to this approach, but would like to perform it nonetheless.

Here is my understanding of how to approach the problem:

1) split dataset into two groups - 2/3 "training group", 1/3 "validation group"
2) create logistic model with "training group"

3) predict outcome for "validation group" using logistic model
4) compare predicted outcome with actual outcome in "validation group"

I am unsure what code is needed to complete step 3. And I am unsure what tests/code is required to complete step 4.

Apologies for my lack of statistical knowledge (which I hope to improve). And apologies if this question is more of a "how to do" rather than stata troubleshooting. I have searched this forum for similar questions but came up empty.

Any help would be greatly appreciated.

Kind regards
Tags: None
wbuchanan

Join Date: Mar 2014

Posts: 1362
#2

15 Apr 2017, 02:51

Assuming you fit the model with an if statement to identify the group (e.g., if testing == 0), you could:

Code:

predict yhat if !e(sample) ta outcome yhat

any other equivalent statement. After an estimation command e(sample) can be used to identify the estimation sample. You could also use predict without an if statement and you'd get predictions for the entire data set, then compare the outcome for the testing sample.
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#3

15 Apr 2017, 06:31

while I agree with William's first command, the second may not even work (depending on your N) and, in my opinion, will be much less useful than

Code:

lowess depvar yhat, addplot(function y=x, range(0 1)) legend(off)

where you should replace "depvar" with the name of your outcome variable
2 likes
Comment

Announcement

split-sample internal validation logistic regression

Comment

Comment