Hello everyone,
Apologies in advance as I am not an experienced Stata user.
I am working with a long format dataset with unique patient ID as identifiers.
It's a case-control study and cases are matched to controls 1:1 on age, sex and GP practice.
After removing patients with a follow-up of less than 1 year I get:
95 046 unique patient IDs
47 541 are cases
47 256 are controls
28,067,037 observations
1) Is there a way to identify which cases are now left without controls in order to delete them?
I have created a variable called "pairs" which matches the cases and the controls.
Cases and controls have therefore different patient ID numbers but have the same pairs number.
My initial idea was to store levelsof pairs for cases and controls separately and then drop the values of pairs that are not appearing in the controls group.
That wouldn't work because "macro substitution results in line that is too long"
2)What is good practice to track changes in cases/controls numbers as I am going through the cleaning process if I'm using a longformat?
I have been using even tag.
I need to create a flowchart with how many cases/controls I have excluded in each step.
Thank you in advance for your help.
Louisa
Apologies in advance as I am not an experienced Stata user.
I am working with a long format dataset with unique patient ID as identifiers.
It's a case-control study and cases are matched to controls 1:1 on age, sex and GP practice.
After removing patients with a follow-up of less than 1 year I get:
95 046 unique patient IDs
47 541 are cases
47 256 are controls
28,067,037 observations
1) Is there a way to identify which cases are now left without controls in order to delete them?
I have created a variable called "pairs" which matches the cases and the controls.
Cases and controls have therefore different patient ID numbers but have the same pairs number.
My initial idea was to store levelsof pairs for cases and controls separately and then drop the values of pairs that are not appearing in the controls group.
That wouldn't work because "macro substitution results in line that is too long"
2)What is good practice to track changes in cases/controls numbers as I am going through the cleaning process if I'm using a longformat?
I have been using even tag.
I need to create a flowchart with how many cases/controls I have excluded in each step.
Thank you in advance for your help.
Louisa
Comment