Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to take into account repeated measures in logistic regression?

    I am currently doing a research study to predict a health outcome using some biomarkers. These biomarkers are obtained over 4 visits, so they are considered repeated measures. I also want to test if variables such as age, weight play a role in predicting that outcome. I have contemplated using GEE but it doesn't seem to function like the logistic regression where I can add and remove variables via a stepwise process. Alternatively, is there a way to get the logistic function in stata to take into account subject and visit effect? Thank you for your help!

  • #2
    The prevailing opinion in this forum about the use of stepwise procedures for selecting variables is: don't do it--it's statistical garbage. I think you are unlikely to find anyone here who will help you figure out how to make that happen.

    Also, the prevailing cultural norm here is to use our real first and last names as our username. You cannot change your username by editing your profile,however. To do that, you have click on Contact Us (lower right hand corner of your screen) and then send a message to the system administrator requesting that the change be made.


    Comment


    • #3
      By the way, here is a link to a detailed explanation why you should avoid stepwise variable selection: http://www.stata.com/support/faqs/st...ems/index.html

      Comment


      • #4
        Clyde is certainly right about computer driven stepwise procedures if that is what you mean. But if you tell us a bit more about your data, e.g. how far apart are the repeated measures and why were they taken, perhaps you will get some helpful advice. One thing you will need to understand if you don't already is the difference between wide and long data layouts. See the manual entry for the reshape command.
        Richard T. Campbell
        Emeritus Professor of Biostatistics and Sociology
        University of Illinois at Chicago

        Comment


        • #5
          This sounds like it might me a candidate for an xtlogit, clogit, or melogit analysis. For a brief overview of some of these, see

          http://www3.nd.edu/~rwilliam/xsoc739...xedEffects.pdf

          http://www3.nd.edu/~rwilliam/xsoc739...edVsRandom.pdf
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Stata Noob:
            I do share all previous comments.
            As an aside (echoing the FAQ), please post what you typed and what you got from Stata,
            Eventually, please report exactly what you meant by "it did not work", otherwise it is quite impossible to comment positively on what the matter is with the command you invoked.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Prof Clyde: Understood and have contacted the administrators! Thank you for your kind advice.
              Prof Campbell: I have already restructured the data from wide to long. Thank you for your kind advice.

              Dear All

              Thank you for your comments! I have read and understand that stepwise is probably not so wise. (pun unintended) but I still unsure about how I can go about selecting variables for creating a model.

              I currently have 17predictors and I hope to select a few significant ones to predict who are the people who will have the disease. The predictors include patient demographics and those biomarkers that have been taken over 4 clinic visits. The visits are spaced around 1month apart, but some of them might be spaced just 2-3 weeks apart as patients may sometimes come back earlier or later for their follow-up. i am trying to create a model with a smaller subset of significant predictors and I have initially just ran them using uni logistic regression and only selecting those p<0.3. Then I put them into a single model and see which predictors remain significant.

              However, I found that i need to take into account the repeated visits, as there could be some correlation of the biomarkers from visit 1 to visit 4. I am also thinking of looking at the change in levels of biomarkers that might predict the subsequent development of the disease at visit 1/2/3.

              I am not sure how I can go about doing variables selection using xtgee or xtlogit. And when I tried to use xtlogit and using just 3 variables and I got the following:

              Code:

              Code:
              xtset id visit
              panel variable: id (strongly balanced)
              time variable: visit, 1 to 4
              delta: 1 unit
              
              xtlogit pestat sflt plgf ratio, pa corr(ar 1)
              What Stata returned:

              note: observations not equally spaced
              modal spacing is delta visit = 1 unit
              23 groups omitted from estimation

              Iteration 1: tolerance = .01607703
              Iteration 2: tolerance = .04811227
              Iteration 3: tolerance = .1104987
              Iteration 4: tolerance = .07284809
              Iteration 5: tolerance = .11579625
              Iteration 6: tolerance = .2438446
              Iteration 7: tolerance = .23027551
              Iteration 8: tolerance = .21965717
              Iteration 9: tolerance = .12808858
              Iteration 10: tolerance = .19894081
              Iteration 11: tolerance = .18179261
              Iteration 12: tolerance = .19819226
              Iteration 13: tolerance = .14672487
              Iteration 14: tolerance = .18966935
              Iteration 15: tolerance = .17123398
              Iteration 16: tolerance = .19406911
              Iteration 17: tolerance = .15258184
              Iteration 18: tolerance = .18960178
              Iteration 19: tolerance = .16630043
              Iteration 20: tolerance = .19241859
              Iteration 21: tolerance = .15581119
              Iteration 22: tolerance = .18994592
              Iteration 23: tolerance = .16364552
              Iteration 24: tolerance = .19163735
              Iteration 25: tolerance = .15766949
              Iteration 26: tolerance = .19024883
              Iteration 27: tolerance = .16216475
              Iteration 28: tolerance = .19124155
              Iteration 29: tolerance = .15874359
              Iteration 30: tolerance = .19045455
              Iteration 31: tolerance = .16132599
              Iteration 32: tolerance = .19103137
              Iteration 33: tolerance = .15936382
              Iteration 34: tolerance = .19058286
              Iteration 35: tolerance = .16084761
              Iteration 36: tolerance = .1909163
              Iteration 37: tolerance = .15972137
              Iteration 38: tolerance = .19065985
              Iteration 39: tolerance = .16057385
              Iteration 40: tolerance = .19085207
              Iteration 41: tolerance = .1599272
              Iteration 42: tolerance = .19070516
              Iteration 43: tolerance = .16041694
              Iteration 44: tolerance = .1908158
              Iteration 45: tolerance = .16004558
              Iteration 46: tolerance = .19073154
              Iteration 47: tolerance = .16032691
              Iteration 48: tolerance = .19079516
              Iteration 49: tolerance = .16011364
              Iteration 50: tolerance = .19074681
              Iteration 51: tolerance = .16027524
              Iteration 52: tolerance = .19078338
              Iteration 53: tolerance = .16015274
              Iteration 54: tolerance = .19075562
              Iteration 55: tolerance = .16024557
              Iteration 56: tolerance = .19077663
              Iteration 57: tolerance = .16017521
              Iteration 58: tolerance = .1907607
              Iteration 59: tolerance = .16022853
              Iteration 60: tolerance = .19077277
              Iteration 61: tolerance = .16018812
              Iteration 62: tolerance = .19076361
              Iteration 63: tolerance = .16021874
              Iteration 64: tolerance = .19077055
              Iteration 65: tolerance = .16019553
              Iteration 66: tolerance = .19076529
              Iteration 67: tolerance = .16021312
              Iteration 68: tolerance = .19076927
              Iteration 69: tolerance = .16019979
              Iteration 70: tolerance = .19076625
              Iteration 71: tolerance = .16020989
              Iteration 72: tolerance = .19076854
              Iteration 73: tolerance = .16020224
              Iteration 74: tolerance = .19076681
              Iteration 75: tolerance = .16020804
              Iteration 76: tolerance = .19076812
              Iteration 77: tolerance = .16020364
              Iteration 78: tolerance = .19076713
              Iteration 79: tolerance = .16020697
              Iteration 80: tolerance = .19076788
              Iteration 81: tolerance = .16020445
              Iteration 82: tolerance = .19076731
              Iteration 83: tolerance = .16020636
              Iteration 84: tolerance = .19076774
              Iteration 85: tolerance = .16020491
              Iteration 86: tolerance = .19076741
              Iteration 87: tolerance = .16020601
              Iteration 88: tolerance = .19076766
              Iteration 89: tolerance = .16020518
              Iteration 90: tolerance = .19076747
              Iteration 91: tolerance = .16020581
              Iteration 92: tolerance = .19076762
              Iteration 93: tolerance = .16020533
              Iteration 94: tolerance = .19076751
              Iteration 95: tolerance = .16020569
              Iteration 96: tolerance = .19076759
              Iteration 97: tolerance = .16020542
              Iteration 98: tolerance = .19076753
              Iteration 99: tolerance = .16020563
              Iteration 100: tolerance = .19076758

              GEE population-averaged model Number of obs = 3471
              Group and time vars: id visit Number of groups = 903
              Link: logit Obs per group: min = 2
              Family: binomial avg = 3.8
              Correlation: AR(1) max = 4
              Wald chi2(3) = 77.24
              Scale parameter: 1 Prob > chi2 = 0.0000

              ------------------------------------------------------------------------------
              pestat | Coef. Std. Err. z P>|z| [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              sflt | .0002293 .0000321 7.15 0.000 .0001664 .0002922
              plgf | .0001439 .0000852 1.69 0.091 -.0000231 .0003108
              ratio | .0040499 .0015769 2.57 0.010 .0009593 .0071405
              _cons | -4.43446 .2600456 -17.05 0.000 -4.94414 -3.92478
              ------------------------------------------------------------------------------
              convergence not achieved


              I am not quite sure how I can continue. Greatly appreciate any advice! Sorry for the long post.
              Last edited by Arielle Tey; 15 Apr 2016, 01:42.

              Comment


              • #8
                Stata Noob:
                the missed convergence is often a clue for model misspecification.
                You should go back to square one, add one predictor in time and see when the convergence problem starts to creep up.
                As an aside, for the future please use CODE delimiters for posting what you typed and what Stata gave you back. Thanks.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Carlo: this problem appear whenever I have this variable sflt, even on its own. Does this mean that I won't be able to use this variable for my model? Thanks for your time.

                  When I tried another single predictor, I got the following instead. Note: sbp refers to systolic blood pressure.

                  Code:
                  Code:
                  xtlogit pestat sbp, pa corr(ar 1)
                  Results:
                  note: observations not equally spaced
                  modal spacing is delta visit = 1 unit
                  19 groups omitted from estimation

                  Iteration 1: tolerance = 5570.5754
                  estimates diverging (missing predictions)
                  r(430);
                  Last edited by Arielle Tey; 15 Apr 2016, 01:41.

                  Comment


                  • #10
                    Stata NooB:
                    I think that the problem has something to do with the way visits are scheduled.
                    I would take this issue into account as a first step.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      The message about observations not being equally spaced arises because you have asked Stata to fit an autoregressive within-patient correlation structure. That is only possible with equally spaced data. So Stata does it's best and throws out observations that defy the equal spacing constraint and proceeds. It may also be the case (or it may not) that specifying autoregressive is giving you difficulties with convergence (especially if it's the wrong structure).

                      It isn't clear to me why one would expect an autoregressive structure in this data in any case. I'm not saying it's wrong, but my first approach to this kind of situation with repeated measurements of biomarkers would be exchangeable, not autoregressive.

                      Comment


                      • #12
                        I had thought that the biomarkers would be more closely correlated if they were closer in visits. But I would try again using the exchangeable structure. Thank you for your advice!

                        Comment

                        Working...
                        X