I want to include a variable in my logit model in order to control for it (to avoid omitted variable bias from it). However, I want to “neutralize” the variable before predicting probabilities, since I do want to let the predicted probabilities reflect differences on the variable between my cases (think of the difference between legitimate and illegitimate risk-adjusters in a model explaining e.g. expenditures). How can I do that?
If my model were a linear regression (OLS), then I would replace everyone’s actual value with the mean value on the variable. This would make sure that the mean of the predicted values equals the mean of the dependent variable.
However, in a logit model this is not necessary true due to the non-linear nature of the logit model (at least as I understand it). I want the mean of the predicted values to equal the mean on the dependent variable, since I want to compare each case’s predicted probability to the "natural" probability among all cases. Normally, the mean of the predicted values equals the mean of the dependent variable when one does not change the value on the independent variables before predicting values.
Concretely, the variable in question is a person’s age (measured as an integer, e.g. 3 years, 4 years etc.). My possible solution would be to predict the probabilities multiple times. I would make everyone 1 year old (and using the actual values on the other independent variables) and predict probabilities. Then, I would make everyone 2 years old and predict probabilities etc. I would do that for all values on my age variable. Finally, I would take the mean of these predicted probabilities for each person. In this way, I predict probabilities at different ages, which could be relevant due to the non-linear relationship between age and the outcome. However, the solution does not make sure that the mean of the predicted probabilities equals the mean of the dependent variable.
If my model were a linear regression (OLS), then I would replace everyone’s actual value with the mean value on the variable. This would make sure that the mean of the predicted values equals the mean of the dependent variable.
However, in a logit model this is not necessary true due to the non-linear nature of the logit model (at least as I understand it). I want the mean of the predicted values to equal the mean on the dependent variable, since I want to compare each case’s predicted probability to the "natural" probability among all cases. Normally, the mean of the predicted values equals the mean of the dependent variable when one does not change the value on the independent variables before predicting values.
Concretely, the variable in question is a person’s age (measured as an integer, e.g. 3 years, 4 years etc.). My possible solution would be to predict the probabilities multiple times. I would make everyone 1 year old (and using the actual values on the other independent variables) and predict probabilities. Then, I would make everyone 2 years old and predict probabilities etc. I would do that for all values on my age variable. Finally, I would take the mean of these predicted probabilities for each person. In this way, I predict probabilities at different ages, which could be relevant due to the non-linear relationship between age and the outcome. However, the solution does not make sure that the mean of the predicted probabilities equals the mean of the dependent variable.
Comment