Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • identifying matched pairs in dataset

    Hello everyone,

    Apologies in advance as I am not an experienced Stata user.

    I am working with a long format dataset with unique patient ID as identifiers.
    It's a case-control study and cases are matched to controls 1:1 on age, sex and GP practice.

    After removing patients with a follow-up of less than 1 year I get:

    95 046 unique patient IDs
    47 541 are cases
    47 256 are controls
    28,067,037 observations

    1) Is there a way to identify which cases are now left without controls in order to delete them?

    I have created a variable called "pairs" which matches the cases and the controls.
    Cases and controls have therefore different patient ID numbers but have the same pairs number.

    My initial idea was to store levelsof pairs for cases and controls separately and then drop the values of pairs that are not appearing in the controls group.
    That wouldn't work because "macro substitution results in line that is too long"

    2)What is good practice to track changes in cases/controls numbers as I am going through the cleaning process if I'm using a longformat?
    I have been using even tag.
    I need to create a flowchart with how many cases/controls I have excluded in each step.

    Thank you in advance for your help.

    Louisa





  • #2
    Hi,
    First of all, your cases and controls are not add up to total number of patient ID.
    Second, how did you create pair variable, in the case of a case have no control which is the value of pair variable?

    Comment


    • #3
      Louisa:
      are yopu looking for something along the following lines?:
      Code:
      . set obs 3
      number of observations (_N) was 0, now 3
      
      . g case_id=_n
      
      . g controls_id=-_n
      
      . egen counter=group( case_id controls_id) in 1/2
      (1 missing value generated)
      
      . g well_coupled=1 if case_id!=. & controls_id!=. & counter!=.
      (1 missing value generated)
      
      . list
      
           +-----------------------------------------+
           | case_id   contro~d   counter   well_c~d |
           |-----------------------------------------|
        1. |       1         -1         1          1 |
        2. |       2         -2         2          1 |
        3. |       3         -3         .          . |
           +-----------------------------------------+
      
      . keep if well_coupled==1
      (1 observation deleted)
      
      . list
      
           +-----------------------------------------+
           | case_id   contro~d   counter   well_c~d |
           |-----------------------------------------|
        1. |       1         -1         1          1 |
        2. |       2         -2         2          1 |
           +-----------------------------------------+
      
      .
      Kind regards,
      Carlo
      (Stata 16.0 SE)

      Comment


      • #4
        Yes, Carlo!
        That's a great idea.
        Thank you both.

        Comment

        Working...
        X