Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating variables with a sequence of numbers based on whether other variables stay the same (for case:control matching)

    Hello everyone.

    This is my first time posting here, and I do so because I haven't been able to find the answer to this question through googling. I apologize in advance if it exists somewhere in the forum.

    I have a patient dataset for controls, and one for cases. I need to create a 3:1 match, controls:cases. Without replacement.
    I need to create a dummy variable, "i", like the one in the examples below.

    The dummy has a sequence of 1 2 3 1 2 3 etc if sex, birthyear and caseyear is the same, and breaks when it changes.
    For the controls, I need a sequence of 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3 etc, with the same conditions as for cases.
    - And also a variable, count, in the controls, that matches the controls to the cases.

    Cases example
    sex birthyear caseyear i
    K 1930 1990 1
    K 1930 1990 2
    K 1930 1990 3
    K 1930 1991 1
    K 1930 1991 2

    Controls example
    sex birthyear caseyear i
    K 1930 1990 1
    K 1930 1990 1
    K 1930 1990 1
    K 1930 1990 2
    K 1930 1990 2
    K 1930 1990 2
    K 1930 1990 3


    If it is any help, here is the SAS code that does what I need. It also has

    Code:
    data cases_match;
                    set cases_final;
                    by sex birthyear caseyear;
                    i+1:
                    if first.caseyear then i = 1;
    run;
    data controls_match;
                    retain count I;   
                    set controls;
                    by sex birthyear caseyear;
                    if _n_ = 1 or first.caseyear then do;
                                    count = 1;
                                    i = 1;
                    end;
                    else do;
                                    if count = 3 then do;
                                                    count = 1;
                                                    i = i+1;
                                    end;
                                    else
                                                    count = count + 1;
                                    end;
                    run;
    Thank you in advance for any help.

    Best regards,
    Henrik

  • #2
    Is it possible to "bump" this post in these forums? I would hope someone has an idea of how to do this. Am I not supplying the right information?

    Comment


    • #3
      Welcome to the Stata Forum / Statalist

      With regards to your question in #2:

      a) Yes, it is possible. Preferrably not in the same day. Therefore, concerning this aspect, you did it just right.

      b) No, you are not supplying the right information. You were expected to read the FAQ and, on account of this, provide data under code delimiters or use the - dataex - command.


      When a thread fails to follow this simple aspect, all the work of typing and preparing the data is left to those (of goodwill) who wish to provide an insightful reply.

      That being said, I decided to do this extra work for you.

      Code:
      input str1 sex birthyear caseyear i
      K 1930 1990 1
      K 1930 1990 2
      K 1930 1990 3
      K 1930 1991 1
      K 1930 1991 2
      K 1930 1990 1
      K 1930 1990 1
      K 1930 1990 1
      K 1930 1990 2
      K 1930 1990 2
      K 1930 1990 2
      K 1930 1990 3
      end
      After that, I tried to provide a solution for your problem.


      Code:
      gen case = 1 in 1/5
      replace case = 0 if missing(case)
      by case, sort : egen float mycase = seq() if case ==1, from(1) to(3)
      by case, sort : egen float mycontrol = seq() if case ==0, from(1) to(3) block(3)
      gen casecontrol = mycase
      replace casecontrol = mycontrol if missing(casecontrol)
      . list
      
           +---------------------------------------------------------------------+
           | sex   birthy~r   caseyear   i   case   mycase   mycont~l   caseco~l |
           |---------------------------------------------------------------------|
        1. |   K       1930       1990   3      0        .          1          1 |
        2. |   K       1930       1990   2      0        .          1          1 |
        3. |   K       1930       1990   2      0        .          1          1 |
        4. |   K       1930       1990   2      0        .          2          2 |
        5. |   K       1930       1990   1      0        .          2          2 |
           |---------------------------------------------------------------------|
        6. |   K       1930       1990   1      0        .          2          2 |
        7. |   K       1930       1990   1      0        .          3          3 |
        8. |   K       1930       1991   2      1        1          .          1 |
        9. |   K       1930       1991   1      1        2          .          2 |
       10. |   K       1930       1990   3      1        3          .          3 |
           |---------------------------------------------------------------------|
       11. |   K       1930       1990   2      1        1          .          1 |
       12. |   K       1930       1990   1      1        2          .          2 |
           +---------------------------------------------------------------------+
      Hopefully it helps!
      Last edited by Marcos Almeida; 25 Feb 2019, 04:53.
      Best regards,

      Marcos

      Comment


      • #4
        Thank you Marcos.

        I have read the FAQ now and wished I had used the dataex before.

        Your suggestion worked. I didn't know that "by varlist:" did exactly what I needed. I suspected the seq() function could do the "block" but I seem to have overlooked a part of the manual description of "block".

        A tiny suggestion: "blocks" should be written in bold in the manual, as all the other options are written in bold.

        I needed a sequence that restarted if caseyear changed, since it is the first to change when sorted by sex, birthyear and caseyear.

        I used the following:
        Code:
        sort sex birthyear caseyear
        by sex birthyear caseyear: egen float j = seq(), from(1) to(999999) block(3)
        And then I could use the joinby command to join controls to cases in my dataset. I wonder how the seq() code above can be done "manually" with loops.

        Anyway - thank you for the help, I got an answer to my question.
        Last edited by Henrik Vitus Bering; 28 Feb 2019, 03:38.

        Comment


        • #5
          Thank you for your feedback. Surely, using - dataex - is the best approach to entail an insightful reply in most cases.
          Best regards,

          Marcos

          Comment

          Working...
          X