Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selection one variable among the multiple variables


    Hello, I am beginner of STATA, and look forward to your guidance.

    I want to make a new variable: baseline_age of my dataset.

    I have to assemble baseline age of the dataset.
    However, some of the ID does not have the data of age of visit_number 1.
    The individual who has no age data at visit_number1 has the record of age the next year, therefore, I want to make a new variable: baseline age by substration -1 from the next year in this ID.
    For example, ID number 1, and 2 have age data in visit_number 1, also has the next year data of age, visit_number 2.
    However, ID number 3, and 4 doesn't have age data in visit_number 1, only have the data of age in next year, visit_number 2.

    My dataset is like as below:
    Code:
    ID visit_number age 
    1 1 48
    2 2 49
    2 1 25
    2 2 26
    3 1 .
    3 2 29
    4 1 .
    4 2 35
    Therefore, I made a code like as below:
    Code:
    generate age1=age if visit_number==1
    generate age2=age if visit_number==2
    generate age2_1=age2-1
    generate baseline_age=age1
    replace baseline_age=age2_1 if age1 == .
    The dataset has been changed like as below: Sorry to not upload as dataex. I don't know why dataex is not working in my STATA.

    Code:
    ID visit_number age age1 age2 age2_1 baseline_age
    1 1 48 48 . . 48
    1 2 49 . 49 48 48
    2 1 25 25 . . 25
    2 2 26 . 26 25 25
    3 1 . . . . .
    3 2 29 . 29 28 28
    4 1 . . . . .
    4 2 35 . 35 34 34
    I want to make baseline_age variable from ID 1,2 from visit_number 1 row, and ID 3, 4 from visit_number 2.
    However, this modified dataset has same structure in ID 1,2, (visit number 2) and ID 3,4, (visit number2).
    I want to know the methods to selection of age variable from different row.

    Thank you for your understanding.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id visit_number age)
    1 1 48
    2 2 49
    2 1 25
    2 2 26
    3 1  .
    3 2 29
    4 1  .
    4 2 35
    end
    
    by id (visit_number), sort: replace age = age[2] - 1 if _n == 1 & missing(age[1])

    Comment


    • #3
      Thank you for your replying!

      That command makes me the dataset which has ID has 2 rows of age1 and age2.
      I want to keep the row of ID which has visit_number 1 and age1, despite they have the visit_number2 and age2, because age1 is their baseline_age. (the row ID #1_age1, ID #2_age1)
      Also, I want to keep the only row of ID which has visit_number_2 and age2 who doesn't have visit_number1 and age 1. (the row of ID #3_age2, ID#4_age2)

      Code:
      id visit_number age
      1 1 48
      2 2 49
      2 1 25
      2 2 26
      3 1 . 
      3 2 29
      4 1 .
      4 2 35

      I want to keep baseline_age to unique ID.

      If I choose the command drop, the whole row has to be dropped.
      I want to know the appropriate command to selection and drop.
      Please understand if I wrote something confusing.

      Thank you very much!!

      Comment


      • #4
        OK, I'm not entirely sure I understand what you want, but perhaps it is this. Try it and see if what it produces is what you need.
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte(id visit_number age)
        1 1 48
        2 2 49
        2 1 25
        2 2 26
        3 1  .
        3 2 29
        4 1  .
        4 2 35
        end
        
        by id (visit_number), sort: gen baseline_age = min(age[1], age[2]-1)
        by id (visit_number): keep if _n == cond(missing(age[1]), 2, 1)

        Comment


        • #5
          Thank you so much of your guidance.

          I did as you recommended, and I can omit the row of ID who has no visit_number1_age1.

          Code:
          id visit_number age
          1 1 48
          2 2 49
          2 1 25
          2 2 26
          3 2 29
          4 2 35

          However, I couldn't the delete the row of visit_number2_age2 of the ID who has both of visit_number_age1 and visit_number_age2.
          I want to make a dataset like as below:

          Code:
          id visit_number age
          1 1 48
          2 1 25
          3 2 29
          4 2 35
          Thank you for your understanding!

          Comment


          • #6
            Your new input is very different from the original input. Starting from the new input, you can do this:
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input byte(id visit_number age)
            1 1 48
            1 2 49
            2 1 25
            2 2 26
            3 2 29
            4 2 35
            end
            
            by id (visit_number), sort: keep if _n == 1
            Note: In your input in #5, you give the second observation as 2 2 49. I believe you meant 1 2 49, and I changed it accordingly in the -dataex- output for this post.

            Comment


            • #7
              Oh, Thank you so much!

              I can make a dataset which I wanted!!!
              Thank you very much for your guidance.
              I will study thoroughly the command you taught me and use them in the future when I organize the dataset.

              Code:
              generate age1=age if visit_number==1
              generate age2=age if visit_number==2
              generate age2_1=age2-1
              
              by id (visit_number), sort : gen baseline_age = min (age1, age2_1)
              by id (visit_number): keep if _n == cond(missing(age1), 2, 1)
              by id (visit_number), sort : keep if _n==1


              Comment

              Working...
              X