How to take into account repeated measures in logistic regression?

Arielle Tey

Join Date: Apr 2016

Posts: 9
#1

How to take into account repeated measures in logistic regression?

13 Apr 2016, 18:58

I am currently doing a research study to predict a health outcome using some biomarkers. These biomarkers are obtained over 4 visits, so they are considered repeated measures. I also want to test if variables such as age, weight play a role in predicting that outcome. I have contemplated using GEE but it doesn't seem to function like the logistic regression where I can add and remove variables via a stepwise process. Alternatively, is there a way to get the logistic function in stata to take into account subject and visit effect? Thank you for your help!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

13 Apr 2016, 19:23

The prevailing opinion in this forum about the use of stepwise procedures for selecting variables is: don't do it--it's statistical garbage. I think you are unlikely to find anyone here who will help you figure out how to make that happen.

Also, the prevailing cultural norm here is to use our real first and last names as our username. You cannot change your username by editing your profile,however. To do that, you have click on Contact Us (lower right hand corner of your screen) and then send a message to the system administrator requesting that the change be made.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#3

13 Apr 2016, 20:29

By the way, here is a link to a detailed explanation why you should avoid stepwise variable selection: http://www.stata.com/support/faqs/st...ems/index.html
Comment
Dick Campbell

Join Date: Apr 2014

Posts: 279
#4

13 Apr 2016, 20:39

Clyde is certainly right about computer driven stepwise procedures if that is what you mean. But if you tell us a bit more about your data, e.g. how far apart are the repeated measures and why were they taken, perhaps you will get some helpful advice. One thing you will need to understand if you don't already is the difference between wide and long data layouts. See the manual entry for the reshape command.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#5

13 Apr 2016, 21:08

This sounds like it might me a candidate for an xtlogit, clogit, or melogit analysis. For a brief overview of some of these, see

http://www3.nd.edu/~rwilliam/xsoc739...xedEffects.pdf

http://www3.nd.edu/~rwilliam/xsoc739...edVsRandom.pdf

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17740
#6

14 Apr 2016, 00:30

Stata Noob:
I do share all previous comments.
As an aside (echoing the FAQ), please post what you typed and what you got from Stata,
Eventually, please report exactly what you meant by "it did not work", otherwise it is quite impossible to comment positively on what the matter is with the command you invoked.

Kind regards,
Carlo
(Stata 19.0)
Comment
Arielle Tey

Join Date: Apr 2016

Posts: 9
#7

15 Apr 2016, 01:07

Prof Clyde: Understood and have contacted the administrators! Thank you for your kind advice.
Prof Campbell: I have already restructured the data from wide to long. Thank you for your kind advice.

Dear All

Thank you for your comments! I have read and understand that stepwise is probably not so wise. (pun unintended) but I still unsure about how I can go about selecting variables for creating a model.

I currently have 17predictors and I hope to select a few significant ones to predict who are the people who will have the disease. The predictors include patient demographics and those biomarkers that have been taken over 4 clinic visits. The visits are spaced around 1month apart, but some of them might be spaced just 2-3 weeks apart as patients may sometimes come back earlier or later for their follow-up. i am trying to create a model with a smaller subset of significant predictors and I have initially just ran them using uni logistic regression and only selecting those p<0.3. Then I put them into a single model and see which predictors remain significant.

However, I found that i need to take into account the repeated visits, as there could be some correlation of the biomarkers from visit 1 to visit 4. I am also thinking of looking at the change in levels of biomarkers that might predict the subsequent development of the disease at visit 1/2/3.

I am not sure how I can go about doing variables selection using xtgee or xtlogit. And when I tried to use xtlogit and using just 3 variables and I got the following:

Code:

Code:

xtset id visit panel variable: id (strongly balanced) time variable: visit, 1 to 4 delta: 1 unit xtlogit pestat sflt plgf ratio, pa corr(ar 1)

What Stata returned:

note: observations not equally spaced
modal spacing is delta visit = 1 unit
23 groups omitted from estimation

Iteration 1: tolerance = .01607703
Iteration 2: tolerance = .04811227
Iteration 3: tolerance = .1104987
Iteration 4: tolerance = .07284809
Iteration 5: tolerance = .11579625
Iteration 6: tolerance = .2438446
Iteration 7: tolerance = .23027551
Iteration 8: tolerance = .21965717
Iteration 9: tolerance = .12808858
Iteration 10: tolerance = .19894081
Iteration 11: tolerance = .18179261
Iteration 12: tolerance = .19819226
Iteration 13: tolerance = .14672487
Iteration 14: tolerance = .18966935
Iteration 15: tolerance = .17123398
Iteration 16: tolerance = .19406911
Iteration 17: tolerance = .15258184
Iteration 18: tolerance = .18960178
Iteration 19: tolerance = .16630043
Iteration 20: tolerance = .19241859
Iteration 21: tolerance = .15581119
Iteration 22: tolerance = .18994592
Iteration 23: tolerance = .16364552
Iteration 24: tolerance = .19163735
Iteration 25: tolerance = .15766949
Iteration 26: tolerance = .19024883
Iteration 27: tolerance = .16216475
Iteration 28: tolerance = .19124155
Iteration 29: tolerance = .15874359
Iteration 30: tolerance = .19045455
Iteration 31: tolerance = .16132599
Iteration 32: tolerance = .19103137
Iteration 33: tolerance = .15936382
Iteration 34: tolerance = .19058286
Iteration 35: tolerance = .16084761
Iteration 36: tolerance = .1909163
Iteration 37: tolerance = .15972137
Iteration 38: tolerance = .19065985
Iteration 39: tolerance = .16057385
Iteration 40: tolerance = .19085207
Iteration 41: tolerance = .1599272
Iteration 42: tolerance = .19070516
Iteration 43: tolerance = .16041694
Iteration 44: tolerance = .1908158
Iteration 45: tolerance = .16004558
Iteration 46: tolerance = .19073154
Iteration 47: tolerance = .16032691
Iteration 48: tolerance = .19079516
Iteration 49: tolerance = .16011364
Iteration 50: tolerance = .19074681
Iteration 51: tolerance = .16027524
Iteration 52: tolerance = .19078338
Iteration 53: tolerance = .16015274
Iteration 54: tolerance = .19075562
Iteration 55: tolerance = .16024557
Iteration 56: tolerance = .19077663
Iteration 57: tolerance = .16017521
Iteration 58: tolerance = .1907607
Iteration 59: tolerance = .16022853
Iteration 60: tolerance = .19077277
Iteration 61: tolerance = .16018812
Iteration 62: tolerance = .19076361
Iteration 63: tolerance = .16021874
Iteration 64: tolerance = .19077055
Iteration 65: tolerance = .16019553
Iteration 66: tolerance = .19076529
Iteration 67: tolerance = .16021312
Iteration 68: tolerance = .19076927
Iteration 69: tolerance = .16019979
Iteration 70: tolerance = .19076625
Iteration 71: tolerance = .16020989
Iteration 72: tolerance = .19076854
Iteration 73: tolerance = .16020224
Iteration 74: tolerance = .19076681
Iteration 75: tolerance = .16020804
Iteration 76: tolerance = .19076812
Iteration 77: tolerance = .16020364
Iteration 78: tolerance = .19076713
Iteration 79: tolerance = .16020697
Iteration 80: tolerance = .19076788
Iteration 81: tolerance = .16020445
Iteration 82: tolerance = .19076731
Iteration 83: tolerance = .16020636
Iteration 84: tolerance = .19076774
Iteration 85: tolerance = .16020491
Iteration 86: tolerance = .19076741
Iteration 87: tolerance = .16020601
Iteration 88: tolerance = .19076766
Iteration 89: tolerance = .16020518
Iteration 90: tolerance = .19076747
Iteration 91: tolerance = .16020581
Iteration 92: tolerance = .19076762
Iteration 93: tolerance = .16020533
Iteration 94: tolerance = .19076751
Iteration 95: tolerance = .16020569
Iteration 96: tolerance = .19076759
Iteration 97: tolerance = .16020542
Iteration 98: tolerance = .19076753
Iteration 99: tolerance = .16020563
Iteration 100: tolerance = .19076758

GEE population-averaged model Number of obs = 3471
Group and time vars: id visit Number of groups = 903
Link: logit Obs per group: min = 2
Family: binomial avg = 3.8
Correlation: AR(1) max = 4
Wald chi2(3) = 77.24
Scale parameter: 1 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
pestat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sflt | .0002293 .0000321 7.15 0.000 .0001664 .0002922
plgf | .0001439 .0000852 1.69 0.091 -.0000231 .0003108
ratio | .0040499 .0015769 2.57 0.010 .0009593 .0071405
_cons | -4.43446 .2600456 -17.05 0.000 -4.94414 -3.92478
------------------------------------------------------------------------------
convergence not achieved

I am not quite sure how I can continue. Greatly appreciate any advice! Sorry for the long post.

Last edited by Arielle Tey; 15 Apr 2016, 01:42.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17740
#8

15 Apr 2016, 01:27

Stata Noob:
the missed convergence is often a clue for model misspecification.
You should go back to square one, add one predictor in time and see when the convergence problem starts to creep up.
As an aside, for the future please use CODE delimiters for posting what you typed and what Stata gave you back. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Arielle Tey

Join Date: Apr 2016

Posts: 9
#9

15 Apr 2016, 01:33

Carlo: this problem appear whenever I have this variable sflt, even on its own. Does this mean that I won't be able to use this variable for my model? Thanks for your time.

When I tried another single predictor, I got the following instead. Note: sbp refers to systolic blood pressure.

Code:

Code:

xtlogit pestat sbp, pa corr(ar 1)

Results:
note: observations not equally spaced
modal spacing is delta visit = 1 unit
19 groups omitted from estimation

Iteration 1: tolerance = 5570.5754
estimates diverging (missing predictions)
r(430);

Last edited by Arielle Tey; 15 Apr 2016, 01:41.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17740
#10

15 Apr 2016, 10:26

Stata NooB:
I think that the problem has something to do with the way visits are scheduled.
I would take this issue into account as a first step.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#11

16 Apr 2016, 09:49

The message about observations not being equally spaced arises because you have asked Stata to fit an autoregressive within-patient correlation structure. That is only possible with equally spaced data. So Stata does it's best and throws out observations that defy the equal spacing constraint and proceeds. It may also be the case (or it may not) that specifying autoregressive is giving you difficulties with convergence (especially if it's the wrong structure).

It isn't clear to me why one would expect an autoregressive structure in this data in any case. I'm not saying it's wrong, but my first approach to this kind of situation with repeated measurements of biomarkers would be exchangeable, not autoregressive.
Comment
Arielle Tey

Join Date: Apr 2016

Posts: 9
#12

28 Apr 2016, 02:18

I had thought that the biomarkers would be more closely correlated if they were closer in visits. But I would try again using the exchangeable structure. Thank you for your advice!
Comment

Announcement

How to take into account repeated measures in logistic regression?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment