Validating logit model on another data frame

Warren Gee

Join Date: Nov 2023

Posts: 2
#1

Validating logit model on another data frame

07 Nov 2023, 17:22

Hello, I have 2 data frames, one is a training set, one is a validation set.

I want to create the logit model from the training set, and then run it against the validation set to get the classification results.

I ran:

Code:

frame change beer_training logit BeerPreference Gender Married Income Age estat classification

Now I want to apply that model to beer_validation and run estat classification.

I am super new to Stata, to please be gentle. 😂
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

07 Nov 2023, 17:57

I think you have just made things more complicated by having the two data sets in different frames. As far as I can tell, the way Stata does out-of-sample prediction calculations (which is what you need to see how the model works in the validation sample) does not work across different frames. It probably has to do with some difficulties defining e(sample) in a frame where the regression was not carried out--that's just my speculation.

Be that as it may, the simplest approach is to append the two data sets together, with an indicator variable distinguishing the training and validation data sets. Then you run the regression with a restriction to the training data set, and get your -estat classification-. Then rerun -estat classification- in the other data set. Actually, I think of all the statistics one can use to do this, -estat classification-, especially with the default cutoff of 0.5 (which is almost never a useful cutoff value) is the least useful. I would strongly recommend looking at the ROC area and the Hosmer-Lemeshow statistics instead.

Here's an example of how to do this. I've done it by taking the auto.dta and randomly splitting it into two halves. In your case, I imagine, it will instead be a matter of appending two data sets together. Anyway:

Code:

clear* sysuse auto set seed 1234 label define dataset 0 "training" 1 "validation" gen byte dataset:dataset = runiformint(0, 1) logit foreign price mpg if dataset == "training":dataset lroc if e(sample), nograph estat gof if e(sample), group(10) table estat classification if e(sample) lroc if !e(sample), nograph estat gof if !e(sample), group(10) table estat classification if !e(sample)

By the way, in interpreting your results, bear in mind that cross validation by a split data set, like I have shown here, is a much weaker form of validation than cross validation between two data sets collected independently of each other. So titrate your enthusiasm for whatever you find accordingly when you write up your results.
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3872
#3

08 Nov 2023, 01:53

From a purely technical perspective, and assuming your validation dataset is in the frame default, you can do something like

Code:

frame change beer_training logit BeerPreference Gender Married Income Age estat classification frame change default estimates esample : estat classification

The line

Code:

estimate esample :

marks all observations as the estimation sample.
1 like
Comment

Announcement

Validating logit model on another data frame

Comment

Comment