longitudinal survey data -open cohort - appropriate use of lagged variables ?

Anita Feller

Join Date: Oct 2023
Posts: 1

longitudinal survey data -open cohort - appropriate use of lagged variables ?

05 Jan 2024, 06:20

I'm new to this forum and hope to get advice on my issue concerning the use of maybe "unconventional" lagged variables.

The data set I work with is based on three surveys (2012, 2017 and 2022). In each wave, all people of an open, population-based cohort (people with spinal cord injuries) are invited to participate. The aim of the analysis is to examine the determinants of non-response (non-responder analysis), based on the information available for both non-responders and responders. The dataset is in long format.

We know that people who responded to the first invitation are more likely to participate in the follow-up survey(s) and vice versa. We have therefore created the following variables (lagged variables+coding for "no preceding value"). To keep all data in the analysis, we have additionally introduced a group for "not eligible" at previous / subsequent waves.

response_prev (lag of -1):
9 (baseline in the analysis): first invitation to participate (not eligible for previous wave)
1: responder in previous wave
2: non-responder in previous wave

response_2prev (lag of -2):
9 (baseline in the analysis): first or second invitation to participate (person not eligible for previous wave(s))
1: responder in 2012 (as we only have three waves)
2: non-responder in 2012

response_subsequent (lag of +1):
9 (baseline in the analysis): last invitation to participate (person not eligible for subsequent wave, e.g. due to death)
1: responder in subsequent wave
2: non-responder in subsequent wave

response_2subsequent (lag of +2):
...

However, I think we have used the lag-variable in an unconventional and maybe inappropriate way as including lagged variables usually chops off an early period's data (e.g. https://www.statalist.org/forums/for...55#post1486955 and further posts)

So my main question is: Is it wrong to include the lagged variables as described? And if it is wrong, what would be a suitable alternative?

I would like to add that we also want to use the non-responder analysis to generate inverse probability weights for analysis of the survey data.

Thank you very much any comments.

Code:

.  logit response_module   ib9.response_prev  ib9.response_2prev ib9.response_subsequent ib9.response_2subsequent o.age_invitation age_invitation1 age_invitation2 o.years_invitation_since_sci  years_invitation_since_sci1 y
> ears_invitation_since_sci2 i.sex  i.sci_type ib2.sci_degree i.sci_cause_type i.language ib13.org_contact_nr  i.survey ib9.response_prev#i2017.survey ib9.response_subsequent#i2017.survey , or vce(cluster id_swisci)

Iteration 0:   log pseudolikelihood = -6185.8407  
Iteration 1:   log pseudolikelihood = -5045.6617  
Iteration 2:   log pseudolikelihood = -5032.9081  
Iteration 3:   log pseudolikelihood = -5032.8855  
Iteration 4:   log pseudolikelihood = -5032.8855  

Logistic regression                                    Number of obs =   8,965
                                                       Wald chi2(30) = 1174.58
                                                       Prob > chi2   =  0.0000
Log pseudolikelihood = -5032.8855                      Pseudo R2     =  0.1864

                                         (Std. err. adjusted for 4,255 clusters in id_swisci)
---------------------------------------------------------------------------------------------
                            |               Robust
            response_module | Odds ratio   std. err.      z    P>|z|     [95% conf. interval]
----------------------------+----------------------------------------------------------------
              response_prev |
             non-responder  |   .4463852   .0487291    -7.39   0.000     .3604039     .552879
                 responder  |   2.279667   .2389468     7.86   0.000     1.856313     2.79957
                            |
             response_2prev |
             non-responder  |   .2867766   .0364969    -9.81   0.000     .2234676    .3680211
                 responder  |   .8681445   .0926049    -1.33   0.185     .7043596    1.070014
                            |
        response_subsequent |
             non-responder  |   2.406085   .3344575     6.32   0.000     1.832272      3.1596
                 responder  |   10.22923   1.667511    14.26   0.000     7.431641    14.07994
                            |
       response_2subsequent |
             non-responder  |   .5950989   .0696937    -4.43   0.000     .4730451    .7486446
                 responder  |   1.743404   .2761124     3.51   0.000      1.27817    2.377976
                            |
             age_invitation |          1  (omitted)
            age_invitation1 |   1.000027   4.73e-06     5.74   0.000     1.000018    1.000036
            age_invitation2 |    .999998   3.53e-07    -5.73   0.000     .9999973    .9999987
 years_invitation_since_sci |          1  (omitted)
years_invitation_since_sci1 |   .8938668   .0187206    -5.36   0.000     .8579179     .931322
years_invitation_since_sci2 |   1.027287   .0052215     5.30   0.000     1.017104    1.037572
                            |
                        sex |
                    Female  |   .9865501   .0420172    -0.32   0.751     .9075413    1.072437
                            |
                   sci_type |
               tetraplegia  |   .8939945   .0367002    -2.73   0.006     .8248811    .9688987
                            |
                 sci_degree |
           complete lesion  |   1.022028   .0440677     0.51   0.613     .9392054    1.112153
                            |
             sci_cause_type |
             non-traumatic  |   .9620232   .0507128    -0.73   0.463     .8675903    1.066735
                            |
                   language |
                    French  |   1.049187   .0512481     0.98   0.326     .9534006    1.154597
                   Italien  |   1.156533   .1154846     1.46   0.145     .9509593    1.406546
                            |
             org_contact_nr |
             Organistion 1  |   1.006892   .1055564     0.07   0.948     .8198758    1.236567
             Organistion 2  |    .775689   .0935686    -2.11   0.035     .6123649    .9825734
             Organistion 3  |   .7863481   .1181417    -1.60   0.110     .5857732    1.055602
             Organistion 4  |   .7062273   .0757499    -3.24   0.001     .5723276    .8714537
             Organistion 5  |   1.739262   .1596139     6.03   0.000     1.452946       2.082
             Organistion 6  |   1.154596    .264011     0.63   0.530     .7375536    1.807451
                            |
                     survey |
                      2017  |   .9007761   .1604106    -0.59   0.557     .6353814    1.277024
                      2022  |   1.014293   .1579998     0.09   0.927     .7474276    1.376443
                            |
       response_prev#survey |
        non-responder#2017  |   .7131371   .1191534    -2.02   0.043     .5139861    .9894519
            responder#2017  |   .4326824   .0609956    -5.94   0.000     .3282273    .5703794
                            |
 response_subsequent#survey |
        non-responder#2017  |   .4164344   .0725324    -5.03   0.000      .295998    .5858744
            responder#2017  |   .5785477    .115334    -2.75   0.006     .3914275    .8551197
                            |
                      _cons |   .6111134   .1096565    -2.74   0.006     .4299169    .8686785
---------------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

Tags: None

Announcement

longitudinal survey data -open cohort - appropriate use of lagged variables ?