Hi all,
This is a question about the appropriate logistic regression models to implement for a longitudinal dataset which has two important components I am not quite sure how to deal with: (1) the same participants (n=2676) were surveyed at multiple time points, but none of the survey questions were repeated and (2) the participants are clustered in that they were enrolled in 20 health facilities (and the health facilities are located in 8 counties, with 2-3 health facilities per county). I've been playing with the code, but realize I need to first better understand what kind of analysis should be done for this kind of data.
This data comes from a cluster RCT that was randomized at the health facility level. This analysis is restricted to only the control group participants, who were enrolled from 20 health facilities assigned to control. All participants completed at least 2 surveys (a baseline survey conducted while attending prenatal care and a postnatal survey at 6 weeks after delivery); some participants also completed a 3rd survey which was conducted during pregnancy between the baseline and postnatal survey. Only some of the sample had all 3 surveys because the women enrolled at baseline ranged in how far along their pregnancy was and women who were already 8 months pregnant at baseline were only contacted for a postnatal care survey due to the short timeline.
I am regressing a binary outcome (did they seek postnatal care, yes or no) on various predictor variables at the individual level (eg age, pregnancy complications, etc) and have been advised to include fixed effects at the county level. Most of the textbook descriptions of longitudinal or panel data that I have read described repeated measures, so I am confused about what models can be used for data that follows the same people but asks different questions at each time point. I also need to account for the clustering of the individuals and am not sure how to test whether it should be fixed or random effects, and whether I should be accounting for clustering at the county level, the health facility level, or both. Thank you so much in advance for any thoughts, pointers in the right direction!
This is a question about the appropriate logistic regression models to implement for a longitudinal dataset which has two important components I am not quite sure how to deal with: (1) the same participants (n=2676) were surveyed at multiple time points, but none of the survey questions were repeated and (2) the participants are clustered in that they were enrolled in 20 health facilities (and the health facilities are located in 8 counties, with 2-3 health facilities per county). I've been playing with the code, but realize I need to first better understand what kind of analysis should be done for this kind of data.
This data comes from a cluster RCT that was randomized at the health facility level. This analysis is restricted to only the control group participants, who were enrolled from 20 health facilities assigned to control. All participants completed at least 2 surveys (a baseline survey conducted while attending prenatal care and a postnatal survey at 6 weeks after delivery); some participants also completed a 3rd survey which was conducted during pregnancy between the baseline and postnatal survey. Only some of the sample had all 3 surveys because the women enrolled at baseline ranged in how far along their pregnancy was and women who were already 8 months pregnant at baseline were only contacted for a postnatal care survey due to the short timeline.
I am regressing a binary outcome (did they seek postnatal care, yes or no) on various predictor variables at the individual level (eg age, pregnancy complications, etc) and have been advised to include fixed effects at the county level. Most of the textbook descriptions of longitudinal or panel data that I have read described repeated measures, so I am confused about what models can be used for data that follows the same people but asks different questions at each time point. I also need to account for the clustering of the individuals and am not sure how to test whether it should be fixed or random effects, and whether I should be accounting for clustering at the county level, the health facility level, or both. Thank you so much in advance for any thoughts, pointers in the right direction!
Comment