Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • is a repeated logistic regression appopriate for my data

    Here’s a clearer, more methodologically precise version of your question that you can use for a methods forum, CrossValidated, or when asking a collaborator:
    Revised Question


    Hello,

    I am working with data from a two-wave survey in which a subset of participants were observed at both time points, while others were only observed in one wave due to attrition. In both Wave 1 and Wave 2, respondents were asked a binary question: “Have you ever used AI?” (Yes/No).

    The number of respondents reporting AI use increased from 27 in Wave 1 to 71 in Wave 2. I am interested in identifying demographic and socioeconomic covariates (e.g., age, race/ethnicity, education) associated with AI use, as well as evaluating whether the association between these covariates and AI use differs across waves.

    Because some individuals are observed at both time points (i.e., repeated measurements), while others are only observed in one wave, I am unsure whether a repeated-measures logistic regression model (e.g., using generalized estimating equations or a mixed-effects logistic model) would be appropriate for this analysis.

    Specifically:
    • Is it methodologically appropriate to use a repeated-measures logistic regression model when the panel is unbalanced due to attrition (i.e., not all Wave 1 participants are present in Wave 2)?
    • Would a population-averaged model (e.g., GEE) with AI use as the binary outcome and survey wave as a predictor be suitable for evaluating covariate associations with AI use over time in this context?
    • If I keep only participants who were in both Wave1 and Wave2 then my sample size in Wave2 decreases from 1,384 to 720, I'm not sure if this loss is worth restricting the sample to an exact matched pair
    Any guidance on the appropriate modeling strategy for this type of partially longitudinal (unbalanced panel) binary outcome data would be greatly appreciated. Would a fixed effects logistic regression be a better fit?
    Last edited by Luis Mijares Castaneda; 23 Feb 2026, 20:43.

  • #2
    Starting with the multilevel model is probably not a bad idea. You can even do this as a LPM, using mixed instead of melogit. You can try a two step approach. First, the general association, like:
    Code:
    mixed ai c.age i.education || idvar:
    and then look at the change with an interaction model

    Code:
    mixed ai i.wave##(c.age i.education) || idvar:
    I would say that if you are interested in associations, this is probably fine. Otherwise you might want to apply an FE model to see whether individual changes in the baseline variables influence AI usage. As far as I understand, this is not your goal.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      Could you maybe provide some more information about using a FE model? Would the model use baseline level variables? something like the model below, with age and education measured at baseline?

      Code:
      clogit ai c.age i.education, group(idvar)

      Comment


      • #4
        Would the model use baseline level variables? something like the model below, with age and education measured at baseline?
        No, it would not; in fact, it cannot.

        In a fixed effects model, any variable that is constant across all observations having the same idvar will be removed from the model cause it is colinear with the fixed effect. Only variables that change over time within the same person can have their effects estimated in a fixed-effects model. If you are thinking of including baseline variables to adjust (people often say "control," but that is an abuse of language) for their effects, there is no need to do that in a fixed effects model because the fixed effects themselves automatically adjust for those and all measured or unmeasured variables that vary across, but not within, person.

        But in #1, it sounds like your goal is actually to specifically estimate the effects of demographic variables on AI use. If I have understood that correctly, then including baseline values in a fixed effects model would be counterproductive. You might benefit from including the values of these variables in the observation wave in the fixed effects model. How useful this will be depends on the time-interval between waves. If the two waves are close together, then age or education will hardly change at all and it is unlikely you will be able to detect an effect of those small changes on AI use. But if the waves are, say, 5 years apart, it is reasonable to think that a 5-year age difference could be associated with a noteworthy change in AI use. Similarly, education may change enough over a 5-year period that a corresponding change in AI use might be large enough to detect. Of course, some demographic variables, such as sex, race, and ethnicity, do not change at all--and these can never show any effect in a fixed-effects model. For such variables, you must use a random-effects model, or do a simple between-person analysis using only the baseline observations.

        Comment

        Working...
        X