I'm new to this forum and hope to get advice on my issue concerning the use of maybe "unconventional" lagged variables.
The data set I work with is based on three surveys (2012, 2017 and 2022). In each wave, all people of an open, population-based cohort (people with spinal cord injuries) are invited to participate. The aim of the analysis is to examine the determinants of non-response (non-responder analysis), based on the information available for both non-responders and responders. The dataset is in long format.
We know that people who responded to the first invitation are more likely to participate in the follow-up survey(s) and vice versa. We have therefore created the following variables (lagged variables+coding for "no preceding value"). To keep all data in the analysis, we have additionally introduced a group for "not eligible" at previous / subsequent waves.
response_prev (lag of -1):
9 (baseline in the analysis): first invitation to participate (not eligible for previous wave)
1: responder in previous wave
2: non-responder in previous wave
response_2prev (lag of -2):
9 (baseline in the analysis): first or second invitation to participate (person not eligible for previous wave(s))
1: responder in 2012 (as we only have three waves)
2: non-responder in 2012
response_subsequent (lag of +1):
9 (baseline in the analysis): last invitation to participate (person not eligible for subsequent wave, e.g. due to death)
1: responder in subsequent wave
2: non-responder in subsequent wave
response_2subsequent (lag of +2):
...
However, I think we have used the lag-variable in an unconventional and maybe inappropriate way as including lagged variables usually chops off an early period's data (e.g. https://www.statalist.org/forums/for...55#post1486955 and further posts)
So my main question is: Is it wrong to include the lagged variables as described? And if it is wrong, what would be a suitable alternative?
I would like to add that we also want to use the non-responder analysis to generate inverse probability weights for analysis of the survey data.
Thank you very much any comments.
The data set I work with is based on three surveys (2012, 2017 and 2022). In each wave, all people of an open, population-based cohort (people with spinal cord injuries) are invited to participate. The aim of the analysis is to examine the determinants of non-response (non-responder analysis), based on the information available for both non-responders and responders. The dataset is in long format.
We know that people who responded to the first invitation are more likely to participate in the follow-up survey(s) and vice versa. We have therefore created the following variables (lagged variables+coding for "no preceding value"). To keep all data in the analysis, we have additionally introduced a group for "not eligible" at previous / subsequent waves.
response_prev (lag of -1):
9 (baseline in the analysis): first invitation to participate (not eligible for previous wave)
1: responder in previous wave
2: non-responder in previous wave
response_2prev (lag of -2):
9 (baseline in the analysis): first or second invitation to participate (person not eligible for previous wave(s))
1: responder in 2012 (as we only have three waves)
2: non-responder in 2012
response_subsequent (lag of +1):
9 (baseline in the analysis): last invitation to participate (person not eligible for subsequent wave, e.g. due to death)
1: responder in subsequent wave
2: non-responder in subsequent wave
response_2subsequent (lag of +2):
...
However, I think we have used the lag-variable in an unconventional and maybe inappropriate way as including lagged variables usually chops off an early period's data (e.g. https://www.statalist.org/forums/for...55#post1486955 and further posts)
So my main question is: Is it wrong to include the lagged variables as described? And if it is wrong, what would be a suitable alternative?
I would like to add that we also want to use the non-responder analysis to generate inverse probability weights for analysis of the survey data.
Thank you very much any comments.
Code:
. logit response_module ib9.response_prev ib9.response_2prev ib9.response_subsequent ib9.response_2subsequent o.age_invitation age_invitation1 age_invitation2 o.years_invitation_since_sci years_invitation_since_sci1 y > ears_invitation_since_sci2 i.sex i.sci_type ib2.sci_degree i.sci_cause_type i.language ib13.org_contact_nr i.survey ib9.response_prev#i2017.survey ib9.response_subsequent#i2017.survey , or vce(cluster id_swisci) Iteration 0: log pseudolikelihood = -6185.8407 Iteration 1: log pseudolikelihood = -5045.6617 Iteration 2: log pseudolikelihood = -5032.9081 Iteration 3: log pseudolikelihood = -5032.8855 Iteration 4: log pseudolikelihood = -5032.8855 Logistic regression Number of obs = 8,965 Wald chi2(30) = 1174.58 Prob > chi2 = 0.0000 Log pseudolikelihood = -5032.8855 Pseudo R2 = 0.1864 (Std. err. adjusted for 4,255 clusters in id_swisci) --------------------------------------------------------------------------------------------- | Robust response_module | Odds ratio std. err. z P>|z| [95% conf. interval] ----------------------------+---------------------------------------------------------------- response_prev | non-responder | .4463852 .0487291 -7.39 0.000 .3604039 .552879 responder | 2.279667 .2389468 7.86 0.000 1.856313 2.79957 | response_2prev | non-responder | .2867766 .0364969 -9.81 0.000 .2234676 .3680211 responder | .8681445 .0926049 -1.33 0.185 .7043596 1.070014 | response_subsequent | non-responder | 2.406085 .3344575 6.32 0.000 1.832272 3.1596 responder | 10.22923 1.667511 14.26 0.000 7.431641 14.07994 | response_2subsequent | non-responder | .5950989 .0696937 -4.43 0.000 .4730451 .7486446 responder | 1.743404 .2761124 3.51 0.000 1.27817 2.377976 | age_invitation | 1 (omitted) age_invitation1 | 1.000027 4.73e-06 5.74 0.000 1.000018 1.000036 age_invitation2 | .999998 3.53e-07 -5.73 0.000 .9999973 .9999987 years_invitation_since_sci | 1 (omitted) years_invitation_since_sci1 | .8938668 .0187206 -5.36 0.000 .8579179 .931322 years_invitation_since_sci2 | 1.027287 .0052215 5.30 0.000 1.017104 1.037572 | sex | Female | .9865501 .0420172 -0.32 0.751 .9075413 1.072437 | sci_type | tetraplegia | .8939945 .0367002 -2.73 0.006 .8248811 .9688987 | sci_degree | complete lesion | 1.022028 .0440677 0.51 0.613 .9392054 1.112153 | sci_cause_type | non-traumatic | .9620232 .0507128 -0.73 0.463 .8675903 1.066735 | language | French | 1.049187 .0512481 0.98 0.326 .9534006 1.154597 Italien | 1.156533 .1154846 1.46 0.145 .9509593 1.406546 | org_contact_nr | Organistion 1 | 1.006892 .1055564 0.07 0.948 .8198758 1.236567 Organistion 2 | .775689 .0935686 -2.11 0.035 .6123649 .9825734 Organistion 3 | .7863481 .1181417 -1.60 0.110 .5857732 1.055602 Organistion 4 | .7062273 .0757499 -3.24 0.001 .5723276 .8714537 Organistion 5 | 1.739262 .1596139 6.03 0.000 1.452946 2.082 Organistion 6 | 1.154596 .264011 0.63 0.530 .7375536 1.807451 | survey | 2017 | .9007761 .1604106 -0.59 0.557 .6353814 1.277024 2022 | 1.014293 .1579998 0.09 0.927 .7474276 1.376443 | response_prev#survey | non-responder#2017 | .7131371 .1191534 -2.02 0.043 .5139861 .9894519 responder#2017 | .4326824 .0609956 -5.94 0.000 .3282273 .5703794 | response_subsequent#survey | non-responder#2017 | .4164344 .0725324 -5.03 0.000 .295998 .5858744 responder#2017 | .5785477 .115334 -2.75 0.006 .3914275 .8551197 | _cons | .6111134 .1096565 -2.74 0.006 .4299169 .8686785 --------------------------------------------------------------------------------------------- Note: _cons estimates baseline odds.