Dear Statalist,
I'm running an ungoing panel survey where new data are added continuously, so I need to write code to add the new data to existing datasets. I've developed a strategy for merging new data with existing datasets. I've determined that using a long format is best for my situation.
The respondents answer the survey after receiving an intervention with a follow-up survey on the same intervention a year after. The same respondent may receive several interventions and the interventions may be of two different kinds. Sometimes the follow-up data is missing.
I will have a unique identifier for the respondent ID (ID), the intervention date (Date), whether the entry is a follow-up (Followup) and which intervention group the data is on (Group). I’ve made a dataex example below to develop my code on.
What I need to create is a variable identifying the intervention number from the above data. This is to ensure, I can work with just ‘first interventions’ or ‘second interventions’ etc. and their folllowups when I work with the data.
I can almost achieve that with this code:
But it fails when there is no follow-up data for the first intervention, assigning the ‘number’ value of the first intervention to the second intervention follow-up.
I figure a solution could be to restrict the above to the initial surveys and not follow-ups (e.g. &if Follow<1 to the bysort commands above) and then create code for the follow-ups where the Stata finds the matching ID and Date and then copy whatever number is assigned for the intervention. I tried the vlookup ado but it won’t work with my date information.
Solutions?
I'm running an ungoing panel survey where new data are added continuously, so I need to write code to add the new data to existing datasets. I've developed a strategy for merging new data with existing datasets. I've determined that using a long format is best for my situation.
The respondents answer the survey after receiving an intervention with a follow-up survey on the same intervention a year after. The same respondent may receive several interventions and the interventions may be of two different kinds. Sometimes the follow-up data is missing.
I will have a unique identifier for the respondent ID (ID), the intervention date (Date), whether the entry is a follow-up (Followup) and which intervention group the data is on (Group). I’ve made a dataex example below to develop my code on.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte ID int Date byte(Follow-up Group) 1 22738 0 0 1 22312 0 1 1 22312 1 1 2 21947 0 0 2 21947 1 0 3 21785 0 0 3 22758 0 0 3 21785 1 0 3 22758 1 0 4 22166 0 1 4 22404 0 1 5 21530 0 0 5 21530 1 0 6 22670 0 1 6 22670 1 1 7 22706 0 0 7 23013 0 0 7 23013 1 0 end format %td Date
I can almost achieve that with this code:
Code:
sort ID Date Follow , stable bysort ID Follow : generate number =_n if Group<1 bysort ID Follow : replace number =_n if Group
I figure a solution could be to restrict the above to the initial surveys and not follow-ups (e.g. &if Follow<1 to the bysort commands above) and then create code for the follow-ups where the Stata finds the matching ID and Date and then copy whatever number is assigned for the intervention. I tried the vlookup ado but it won’t work with my date information.
Solutions?

Comment