Continuous interaction variables and margins

Dart Stater

Join Date: Apr 2023

Posts: 18
#16

19 Apr 2023, 04:18

Clyde, as I already marked here https://www.statalist.org/forums/forum/general-stata-discussion/general/1324886-continuous-interaction-variables-and-margins?p=1710170#post1710170
variable cr is a binary measure of effect (cancer), which takes 1 in case of (whatever) solid cancer has been detected to the end of follow-up [of exposed to radiation in cumulative dose], and 0 otherwise (that's a reason for using Logit).

There is undoubtedly true that the model must be based on real (observed) interactions, so we use LRT to compare estimates stored at every step when we expand our base model with new parameter.

In that way, as of my understanding to the moment, using the model of high-level interactions ## is less helpful instead of # approach, when we can control each interaction component, adding it step-by step to the model.
I mean that if whatever interaction term appear to be insignificant, we should avoid use the whole model. However, in case if all interactions [included step-by-step] showed significance, we can use ## model.
Maybe you want to make some remarks on it.

p.s.: there is another question on how to interprete the results of LR testing of full and restricted model described here at pp.16-17 https://www.stata.com/why-use-stata/...ith/linear.pdf on what decision should be taken when border-line significance obtained -- but this is offtop.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#17

19 Apr 2023, 10:05

Yes, if this is an exploratory analysis in which you are fitting various models to determine which, in some sense, is best, then you might do it more easily with # terms than with ##. But just be careful when doing that because it is, in most circumstances, invalid to model an interaction without including all of the lower-level interactions and main terms within it. ## prevents you from making that error. With #, you are on your own. And every valid combination of # terms can be abbreviated using ## notation, though it sometimes takes some extra effort to figure out exactly how. But whichever way you find easier to work with is fine.

Personally, I am not a fan of hypothesis tests, and I am particularly not fond of them as a means of choosing between models. If I am going to use some kind of test statistic to choose between models, I prefer AIC or BIC to the plain likelihood ratio test. For predictive modeling purposes, you want to see if adding the term to the model, which always improves the log likelihood (assuming you don't lose sample due to missing values on the added term), is "worth" the overfitting that increases with each additional term. Frankly, I don't even consider those dispositive when working with a dichotomous outcome. For that I would also pay attention to the area under the ROC curve and the model calibration (Hosmer-Lemeshow type results). At the end of the day, though, model selection is still as much an art as it is a science.
Comment
Dart Stater

Join Date: Apr 2023

Posts: 18
#18

19 Apr 2023, 23:01

In case of equal form of sex-dependent curves by age, which model is more likely to be overfitted? The ## model

Code:

sex##age##age##dose

or the mixed model with partial interaction

Code:

age##age##dose + dose#sex
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#19

20 Apr 2023, 09:12

If the forms of sex-dependent curves by age are equal, then age##age##dose + dose#sex is the appropriate model.
Comment
Dart Stater

Join Date: Apr 2023

Posts: 18
#20

20 Apr 2023, 10:59

Thank you, it was helpful!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment