Dear Listers
I have a dataset of 4500 participants grouped in a case cohort and a reference population. I will call this dataset: population_set
In one dataset i have all hospital admissions for a disease, this data is in long format and contains 800 unique IDs covering 3500 contacts. I will call this dataset: hospital_admissions_set
Now i want to merge the two, stset the data for multiple failures and stsplit it by age in age-bands of 1 year.
I have done the following:
now i have a dataset where 3700 will have one observation and 800 will have between 2 and N observations.
then i have stset my data:
now i want to stsplit the data and have written the following
Now in Mario Cleves brilliant book An introduction to Survival Analysis Using Stata, he states that:
The original [dataset] might already have multiple records per subject, some subjects might already have records split at t = 5, and some might not. none of this causes stsplit any difficulty.
in my dataset, i read the above as:
If an ID have had more than one contact to the hospital, the merge dataset will have more than one line in the dataset. The age variable, that the data is stset to, is the age at the end of the obs time. For each line in the dataset there will be a failure==1 or failure==0, and an age_at_failure. This is a data example.
Now two questions:
when i stset the multiple failure data, do i somehow have to incoperata the failure time and the fact that many of the individuals in the set will have more than one observation?
When i then stsplit will stata split every obs into years form birth to age, so that if a 20 year old person have had 5 failures - will that create 100 lines or 20 lines in the split set? How does Stata know the age_at_failures - where do i put that in my code.
Hope you understand my questions.
thank you
Lars
I have a dataset of 4500 participants grouped in a case cohort and a reference population. I will call this dataset: population_set
In one dataset i have all hospital admissions for a disease, this data is in long format and contains 800 unique IDs covering 3500 contacts. I will call this dataset: hospital_admissions_set
Now i want to merge the two, stset the data for multiple failures and stsplit it by age in age-bands of 1 year.
I have done the following:
Code:
use hospital_admissions_set.dta, clear merge m:1 id using population_set.dta
then i have stset my data:
Code:
**the scale of age is in integer years of age** stset age, fail(failure) id(id) exit(time .)
now i want to stsplit the data and have written the following
Code:
stsplit year_bands, every (1)
The original [dataset] might already have multiple records per subject, some subjects might already have records split at t = 5, and some might not. none of this causes stsplit any difficulty.
in my dataset, i read the above as:
If an ID have had more than one contact to the hospital, the merge dataset will have more than one line in the dataset. The age variable, that the data is stset to, is the age at the end of the obs time. For each line in the dataset there will be a failure==1 or failure==0, and an age_at_failure. This is a data example.
Code:
input byte(id age failure age_at_failure group) 1 10 1 7 1 1 10 1 8 1 2 21 0 . 1 3 5 1 3 0 3 5 1 4 0 4 24 0 . 0 end
when i stset the multiple failure data, do i somehow have to incoperata the failure time and the fact that many of the individuals in the set will have more than one observation?
When i then stsplit will stata split every obs into years form birth to age, so that if a 20 year old person have had 5 failures - will that create 100 lines or 20 lines in the split set? How does Stata know the age_at_failures - where do i put that in my code.
Hope you understand my questions.
thank you
Lars