Logistic Regression lrtest help

Billy Jack

Join Date: Nov 2014

Posts: 1
#1

Logistic Regression lrtest help

29 Nov 2014, 15:10

Hello,

I wonder if anyone can help?
I'm new to stata and have been playing about with a data set to get the hang of logistic regression.

I'm attempting to assess the fit between two models - one which includes age as a continuous variable, and one which uses age as a categorical variable (which i created from the dataset by using generate agecat=autocode(age,4,28,96))

Only when i compare the two models i get this output:

Likelihood-ratio test LR chi2(2) = -14.62
(Assumption: b nested in a) Prob > chi2 = 1.0000

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
b | 1000 -629.8479 -549.6739 4 1107.348 1126.979
a | 1000 -629.8479 -556.9846 6 1125.969 1155.416
-----------------------------------------------------------------------------

Where the AIC is lower in the model using i.agecat - but the LR Chi2 test reads -14.62 which just seems wrong I seem to have done something wrong but can't figure it out...
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30024
#2

29 Nov 2014, 15:44

Likelihood-ratio test LR chi2(2) = -14.62
(Assumption: b nested in a) Prob > chi2 = 1.0000
[emphasis added]

But that assumption is clearly incorrect here. So the LR test is not valid.

If you want to do a LR test of whether polychotomizing age in the way you describe improves the model, you can first do a model continuous age, and then another with both continuous age AND agecat. (No, they will not be collinear unless your data are very unusual.) Now you have nested models and you can do an LR test, which tests the incremental improvement in the model by allowing the outcome to jump at each cutpoint of the categorized age variable (which is not quite what you were looking for, but is related to it).

That said, categorizing a continuous variable in this way is rarely a good idea: it throws away information. True, a simple linear relationship between age and outcome is often an implausible model. But the use of a step function (which is what agecat does) is typically even worse. If you suspect a non-linear relationship, modeling it with a spline, or fractional polynomials is usually a better idea.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17695
#3

30 Nov 2014, 04:01

Billy:
as an aside to Clyde's sound advice about the risks of categorizing a continuous variable, you may be intererested in reading:
Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea.Statistics in Medicine 2006 Jan 15;25(1):127-41.
By the way, Patrick Royston was the author of the first user-written programmes for performing multiple imputation of missing data in Stata (now built-in in -mi- suite).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Paul T Seed

Join Date: Apr 2014

Posts: 66
#4

03 Dec 2014, 13:16

Wint non-nested models, it is generally best to ignore the likelihood ratio test, and concentrate on the AIC & BIC. These show, as Clyde indicates, that linear age is a better model than age categories. I would add that you may be able to improve the fit of the categories by choosing different boundaries. 4 and 98 are right at the very limits of any age distribution. Categories bounded by 10, 20 30, 40, 50,60,70. would work better - but probably not as well as linear age.
Comment

Announcement

Logistic Regression lrtest help

Comment

Comment

Comment