Calculation of predicted probabilities

Punar

Join Date: Nov 2014

Posts: 4
#1

Calculation of predicted probabilities

19 Nov 2014, 01:04

When calculating the predicted probabilities in a logistic regression model, do we consider all the variables or just the significant ones?
For eg: Let's say my model has: dependent variable Y and 3 dependent variables X_i out of which coefficients of X₁ an X₂ are significant whereas X₃ is not significant. So for calculating the the predicted probabilities will I use just X₁*beta₁ + X₂*beta₂ or include X₃*beta₃ as well?
Tags: logistic, logistic regression, logit, predicted probability, regression
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#2

19 Nov 2014, 02:56

Stata takes you seriously. So if you ask stata to compute a model that includes x1 x2 and x3, then it will compute a model that contains those variables. The predicted probabilities are just a representation of that model, so they too include all those variables.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Punar

Join Date: Nov 2014

Posts: 4
#3

19 Nov 2014, 03:36

Thanks a lot Maarten! That was very helpful.

From what I understand, the logic behind that is that the variables included in the model should be important and hence are included in the predicted probability calculation. If they are not important, they should not be there in the model.

However, how do we simultaneously say that a variable does not have a significant statistical effect on the dependent variable and include it in calculation of prediction probabilities? It feels like slightly contradicting to me.
Comment
Punar

Join Date: Nov 2014

Posts: 4
#4

19 Nov 2014, 03:40

Originally posted by Maarten Buis View Post

Stata takes you seriously. So if you ask stata to compute a model that includes x1 x2 and x3, then it will compute a model that contains those variables.

Do other software also do the same? Are you aware of any other software that differ?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#5

19 Nov 2014, 04:01

If there's any software that differs, it is not worth serious attention. In fact, it should be avoided like the proverbial plague.

The key point is simple. If you fit a model and then ask for predictions, Stata uses the model you just fitted. Replacing coefficients of predictors that weren't significant with zero would be contrary to fact, unless by a quite extraordinary coincidence all the coefficients were exactly zero. In any case, how would software know your cut-off?

What you could do is refit your model with just the predictors that satisfy your predilections, but watch out: many researchers regard that as cherry-picking and in particular cases it could easily conflict with other desiderata, such as quantifying effects to the extent possible, keeping predictors together that belong together, consistency with previous work, etc. In fact, it is not even guaranteed that those predictors will remain significant.

Punar: We prefer full real names here. Although people in some cultures have just one name, most cultures work with given names and family names, and we ask that you follow suit.
Comment
Svend Juul

Join Date: Apr 2014

Posts: 515
#6

19 Nov 2014, 04:12

If you are serious, you decide - before knowing the result - which model you want to investigate. Stata can - with the stepwise: prefix - include or eliminate predictors based upon significance, but don't do that. See why at:

http://www.stata.com/support/faqs/st...sion-problems/
Comment
daniel klein

Join Date: Mar 2014

Posts: 3848
#7

19 Nov 2014, 04:15

Think of it this way: Statistical significance has little to do with importance. It is much more a measure of accuracy.

Best
Daniel
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#8

19 Nov 2014, 06:35

If an effect is statistically insignificant, you can't rule out that it is 0. On the other hand, you also can't rule out that the effect is actually larger than what was estimated. If, say, the estimated coefficient was 10, and the confidence interval ran from -10 to 30, it would make about as much sense to treat the value as 20 as it does to treat it as 0. And nobody would seriously consider doing that!

Even if effects don't differ from 0, the estimated effects of other coefficients can be affected by the inclusion of those variables in the model.

In short, if you think the effect of a variable should be treated as 0, then drop it from the model and re-estimate the remaining coefficients. Don't just reset it to 0 yourself while leaving all the other coefficients as is.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment

Announcement

Calculation of predicted probabilities

Comment

Comment

Comment

Comment

Comment

Comment

Comment