Hi,
I am a beginner, trying to understand Stata and learning new things. I am working on a project, a retrospective cohort study design. I am trying to find out if an exposure to some medication does have protection on the exposed vs non-exposed population. When I got the data, they gave me each time when a diagnosis was made. I want to keep for the same id, just the first time the diagnosis was entered. I tried duplicates drop person dg, but not sure if that is the correct one and enough. There is also same person (persnumber) with multiple diagnosis /dg(0, 1, 2, ... 7).
Let me give you an example:
id persno dg date
1 2345 4 1/14/2011
2 2345 4 3/15/2017
3 2345 7 2/3/1988
4 2001 5 etc
5 2001 6
6 2017 0
7 2001 0
What I want, or how should it look like
id persno dg(0) dg(4) dg(7) dg(5) date
1 2345 0 1 1 0 1/14/2011 but also 2/3/1988 for dg(7)
2 2001 0 0 0 1
3 2017 1 0 1 0
I know that I could use tab dg, gen(dg_) which will give me dummy variables, but what to do when person duplicates, and also dg duplicates? and I would like to keep the first time the diagnosis was made, and how to connect second dg of interest to its date?
Thank you!
I am a beginner, trying to understand Stata and learning new things. I am working on a project, a retrospective cohort study design. I am trying to find out if an exposure to some medication does have protection on the exposed vs non-exposed population. When I got the data, they gave me each time when a diagnosis was made. I want to keep for the same id, just the first time the diagnosis was entered. I tried duplicates drop person dg, but not sure if that is the correct one and enough. There is also same person (persnumber) with multiple diagnosis /dg(0, 1, 2, ... 7).
Let me give you an example:
id persno dg date
1 2345 4 1/14/2011
2 2345 4 3/15/2017
3 2345 7 2/3/1988
4 2001 5 etc
5 2001 6
6 2017 0
7 2001 0
What I want, or how should it look like
id persno dg(0) dg(4) dg(7) dg(5) date
1 2345 0 1 1 0 1/14/2011 but also 2/3/1988 for dg(7)
2 2001 0 0 0 1
3 2017 1 0 1 0
I know that I could use tab dg, gen(dg_) which will give me dummy variables, but what to do when person duplicates, and also dg duplicates? and I would like to keep the first time the diagnosis was made, and how to connect second dg of interest to its date?
Thank you!
Comment