Dear Statalist,
I am running a logistic regression model on a subset of my data (training data set). Afterwards I use this model to predict probabilities for a different subset of my data (test data set).
What I am interested in is evaluating the goodness of fit of the model in the test data set. I can of course assess the success of the classification (using a cutoff value) but I think that a measure that involves the predicted probabilities would be more precise). Ordinarily I would go for McFaddens R-squared or AIC or something like that but since I don't get a likelihood value for my test data set I can't compute those measures. Am I missing something here? Is there a way to get the likelihood value for out-of-sample predictions?
I read that there are pseudo R-squared measures that don't depend on the likelihood value, e.g. Efron's R-squared and McKelvey & Zavoina's R-squared. But they don't adjust for the number of variables used. Is there a way to calculate them adjusting for that?
Also I would like to simulate different distributions of my dependent variable in my test dataset by using weights (in order to estimate how well my model would fit under different circumstances). I found that I can calculate the success of the classifications with weights using the roctab command. However, I would like to get a goodness of fit measure that uses the predicted probabilities while using weights. Is there a way to calculate a pseudo R-squared measure from the predicted probabilities using weights?
Is there a way to approach this topic in a different way?
Any help will be greatly appreciated!
Best regards,
Max Hörl
I am running a logistic regression model on a subset of my data (training data set). Afterwards I use this model to predict probabilities for a different subset of my data (test data set).
What I am interested in is evaluating the goodness of fit of the model in the test data set. I can of course assess the success of the classification (using a cutoff value) but I think that a measure that involves the predicted probabilities would be more precise). Ordinarily I would go for McFaddens R-squared or AIC or something like that but since I don't get a likelihood value for my test data set I can't compute those measures. Am I missing something here? Is there a way to get the likelihood value for out-of-sample predictions?
I read that there are pseudo R-squared measures that don't depend on the likelihood value, e.g. Efron's R-squared and McKelvey & Zavoina's R-squared. But they don't adjust for the number of variables used. Is there a way to calculate them adjusting for that?
Also I would like to simulate different distributions of my dependent variable in my test dataset by using weights (in order to estimate how well my model would fit under different circumstances). I found that I can calculate the success of the classifications with weights using the roctab command. However, I would like to get a goodness of fit measure that uses the predicted probabilities while using weights. Is there a way to calculate a pseudo R-squared measure from the predicted probabilities using weights?
Is there a way to approach this topic in a different way?
Any help will be greatly appreciated!
Best regards,
Max Hörl