Logistic regression interactions test

Anton Maslov

Join Date: Sep 2020

Posts: 3
#1

Logistic regression interactions test

24 Sep 2020, 11:48

Hello!

I have a logistic regression model that keeps failing the goodness of fit test. The unweighted model does pass the test, but the weighted one does not. There are no issues with multicollinearity (the highest Pearson's r I have is 0.52). I tried different interactions that are based on the literature, but that didn't help. I suspect there might be more interactions that I am not accounting for. Is there a test in STATA that we can run to detect what interactions are present in our data? Any other ideas on what else I can try, or what's going on?
Tags: None
Tom Scott

Join Date: Apr 2019

Posts: 266
#2

24 Sep 2020, 12:41

Anton Maslov maybe consider lasso (https://www.stata.com/new-in-stata/l...on-prediction/) to build your model? I think you would have to manually create every interaction variable first. There is probably a separate Stata command that does that automatically.
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#3

25 Sep 2020, 11:25

Welcome to Stata list. You will increase your chances of useful answer by following the FAQ on asking questions – provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

It's not clear to me exactly what estimator you are using or exactly what a goodness of fit test you are finding the model fails. Depending on your sample size, it is possible to have a reasonably fitting model that fails fit tests. This is been most commonly discussed in the SEM literature where one compares the covariance matrix to the predicted covariance matrix. With a large enough sample size one can almost always reject the hypothesis that they are the same.

There are several different philosophical approaches to estimation. In general, most commenters on this website prefer a theory based model that you test. That is, they're not keen on stepwise regression or other ways to data mine. Part of the problem is that once you do this you cannot pretend that the statistics in the final model are interpretable as if you would only run the model once.

That said, if you use factor variable notation, you can easily run as many interactions as you would like. If your rhs variables are x1 x2 and x3, you can run
logit y (c.x1 c.x2 c.x3)##(c.x1 c.x2 c.x3)
It is likely that this will create some interactions that can't be estimated. You need the c. if the variable is continuous, otherwise Stata will treated as an indicator and try to make dummies for each of its values.
Comment
Anton Maslov

Join Date: Sep 2020

Posts: 3
#4

25 Sep 2020, 14:28

Thanks for the responses!

Tom, I am working in Stata 15. I think Lasso is only available from version 16?

Phil Bromiley , sorry, I should've been clearer. Unfortunately, I cannot yet share the output or code since the data is yet to be made public. The sample size is about 60,000, and I am running weighted bootstrap models. The dependent variable is dichotomous, and I have 11 predictor variables some of which are dichotomous, factor, and continuous. After the svy: logistic command, I am logging estat gof to obtain the goodness-of-fit for the model. VIF tests showed no collinearity. I am checking for interactions with plots, but determining no serious interactions. I am indeed building a model that is testing a theory, not going the stepwise or other data-driven routes. In this sense, should I not be paying attention to the estat gof results, but rather focus on the explanatory effects of predictor variables?
Or are there other tests I should be running to determine how well the data fits the model?

Last edited by Anton Maslov; 25 Sep 2020, 14:37.
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#5

25 Sep 2020, 15:00

Anton Maslov I'm also confused because it did seem like you were data mining and not testing theoretically-derived hypotheses when you asked how to identify significant interactions in the data. You are probably rejecting the model because of your sample size, like Phil said. With a large enough sample size, any model is going to have poor fit to the data.
Comment
Anton Maslov

Join Date: Sep 2020

Posts: 3
#6

28 Sep 2020, 10:10

Tom Scott , okay thank you! I will carry on with the paper and not pay attention to the estat gof results. Could you direct me to literature that addresses the issue of large sample size and lack of fit? Just trying to understand the mechanics behind it.
Comment
Tom Scott

Join Date: Apr 2019

Posts: 266
#7

28 Sep 2020, 21:30

Anton Maslov I am not a statistician so I'm not the best person to help you with the literature. The reasoning behind the sample size-lack of fit issue is the same for all statistical tests: as sample size increases, so does statistical power. With large sample sizes, small departures from the estimated model will be significant, even if the model fits the data well.
Comment

Announcement

Logistic regression interactions test

Comment

Comment

Comment

Comment

Comment

Comment