Hello Statalist,
I have a longitudinal dataset, where each individual is identified by a unique ID and observed at multiple time points. The data includes a variable for the procedure performed on each individual, the duration of the procedure in days, and the individual's age in months at each time point.
I want to expand the dataset so that every individual has as many observations as the longest participating individual based on age. For example, if the maximum age in the data is 200 months, each individual should have 200 observations to represent each month from 1 to 200. For each observation, I want to record the individual's age in months, and the procedure performed on them (if any).
If an individual entered the program later or exited earlier than the longest participating individual, they would have missing values for the procedure variable/duration in the corresponding age months, but their age in months would still be recorded. I am not sure where to begin with my code.
For example, my data looks like this:
And I'd like to go to this, where each individual in this example data will have nine rows (i.e. the maximum age in months) and the recorded placement, and missing values for those that weren't observed for the entire duration:
(Each procedure happens consecutively. For e.g. Individual 4 had phot for 1 day, and then inject until the age of 4 months)
I have a longitudinal dataset, where each individual is identified by a unique ID and observed at multiple time points. The data includes a variable for the procedure performed on each individual, the duration of the procedure in days, and the individual's age in months at each time point.
I want to expand the dataset so that every individual has as many observations as the longest participating individual based on age. For example, if the maximum age in the data is 200 months, each individual should have 200 observations to represent each month from 1 to 200. For each observation, I want to record the individual's age in months, and the procedure performed on them (if any).
If an individual entered the program later or exited earlier than the longest participating individual, they would have missing values for the procedure variable/duration in the corresponding age months, but their age in months would still be recorded. I am not sure where to begin with my code.
For example, my data looks like this:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id str5 proc float(duration age) 1 "dna" 20 0 1 "phot" 30 1 1 "injec" 30 2 1 "lor" 30 3 2 "dna" 1 2 2 "lor" 40 3 3 "dna" 1 0 3 "phot" 30 1 3 "injec" 30 2 3 "dna" 30 3 3 "phot" 30 4 3 "injec" 30 5 4 "phot" 1 2 4 "injec" 60 4 4 "dna" 120 8 4 "phot" 30 9 end
And I'd like to go to this, where each individual in this example data will have nine rows (i.e. the maximum age in months) and the recorded placement, and missing values for those that weren't observed for the entire duration:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id str5 proc float(duration age) 1 "dna" 20 0 1 "phot" 30 1 1 "injec" 30 2 1 "lor" 30 3 1 "." . 4 1 "." . 5 1 "." . 6 1 "." . 7 1 "." . 8 1 "." . 9 2 "." . 0 2 "." . 1 2 "dna" 1 2 2 "lor" 40 3 2 "." . 4 2 "." . 5 2 "." . 6 2 "." . 7 2 "." . 8 2 "." . 9 3 "dna" 1 0 3 "phot" 30 1 3 "injec" 30 2 3 "dna" 30 3 3 "phot" 30 4 3 "injec" 30 5 3 "." . 6 3 "." . 7 3 "." . 8 3 "." . 9 4 "." . 0 4 "." . 1 4 "phot" . 2 4 "injec" . 3 4 "injec" . 4 4 "dna" . 5 4 "dna" . 6 4 "dna" . 7 4 "dna" . 8 4 "phot" . 9 end
(Each procedure happens consecutively. For e.g. Individual 4 had phot for 1 day, and then inject until the age of 4 months)
Comment