Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I Created New Variables within Each ID When Variable(s) Satisfy Some Conditions in Stata?

    Hi, I have a small dataset for the purpose of demonstration below,
    clear
    input str10 id byte fail byte year
    001 0 1
    001 0 2
    001 0 3
    001 1 4
    002 0 1
    002 0 2
    002 0 3
    002 0 4
    002 0 5
    002 0 6
    002 0 7
    002 1 8
    003 0 1
    003 0 2
    003 0 3
    003 0 4
    003 0 5
    003 0 6
    003 0 7
    003 0 8
    003 0 9
    003 0 10
    003 0 11
    003 1 12
    004 0 1
    004 0 2
    004 0 3
    end
    I want to create new variables called "primary", "middle", and "high", respectively based on the following rules in Stata,
    1) Within each id, if fail==1 and year<=5, then all variables of primary==1, otherwise primary==0
    2) Within each id, if fail==1 and year ranged from 6 to 8, then all variables of middle==1, otherwise middle==0
    3) Within each id, if fail==1 and year ranged from 9 to 12, then all variables of high==1, otherwise high==0

    Thank you for your code!
    Last edited by smith Jason; 17 Jul 2022, 14:13.

  • #2
    Code:
    . by id (year), sort: egen primary = max(fail==1 & inrange(year,1,5))
    
    . by id (year), sort: egen middle  = max(fail==1 & inrange(year,6,8))
    
    . by id (year), sort: egen high    = max(fail==1 & inrange(year,9,12))
    
    . 
    . list, sepby(id) noobs
    
      +---------------------------------------------+
      |  id   fail   year   primary   middle   high |
      |---------------------------------------------|
      | 001      0      1         1        0      0 |
      | 001      0      2         1        0      0 |
      | 001      0      3         1        0      0 |
      | 001      1      4         1        0      0 |
      |---------------------------------------------|
      | 002      0      1         0        1      0 |
      | 002      0      2         0        1      0 |
      | 002      0      3         0        1      0 |
      | 002      0      4         0        1      0 |
      | 002      0      5         0        1      0 |
      | 002      0      6         0        1      0 |
      | 002      0      7         0        1      0 |
      | 002      1      8         0        1      0 |
      |---------------------------------------------|
      | 003      0      1         0        0      1 |
      | 003      0      2         0        0      1 |
      | 003      0      3         0        0      1 |
      | 003      0      4         0        0      1 |
      | 003      0      5         0        0      1 |
      | 003      0      6         0        0      1 |
      | 003      0      7         0        0      1 |
      | 003      0      8         0        0      1 |
      | 003      0      9         0        0      1 |
      | 003      0     10         0        0      1 |
      | 003      0     11         0        0      1 |
      | 003      1     12         0        0      1 |
      |---------------------------------------------|
      | 004      0      1         0        0      0 |
      | 004      0      2         0        0      0 |
      | 004      0      3         0        0      0 |
      +---------------------------------------------+

    Comment


    • #3
      Were I doing this for my work, however, I would not create three indicator variables - I would create a single categorical variable, and then use Stata's factor variable notation to include indicator variables in my models.
      Code:
      help factor variables
      Code:
      . generate range = 0
      
      . replace  range = fail + (year>=6) + (year>=9) if fail==1
      (3 real changes made)
      
      . by id (year), sort: egen when = max(range)
      
      . drop range
      
      . label define WHEN 0 "Did not fail" 1 "Primary" 2 "Middle"  3 "High"
      
      . label values when WHEN
      
      . 
      . 
      . list, sepby(id) noobs
      
        +----------------------------------+
        |  id   fail   year           when |
        |----------------------------------|
        | 001      0      1        Primary |
        | 001      0      2        Primary |
        | 001      0      3        Primary |
        | 001      1      4        Primary |
        |----------------------------------|
        | 002      0      1         Middle |
        | 002      0      2         Middle |
        | 002      0      3         Middle |
        | 002      0      4         Middle |
        | 002      0      5         Middle |
        | 002      0      6         Middle |
        | 002      0      7         Middle |
        | 002      1      8         Middle |
        |----------------------------------|
        | 003      0      1           High |
        | 003      0      2           High |
        | 003      0      3           High |
        | 003      0      4           High |
        | 003      0      5           High |
        | 003      0      6           High |
        | 003      0      7           High |
        | 003      0      8           High |
        | 003      0      9           High |
        | 003      0     10           High |
        | 003      0     11           High |
        | 003      1     12           High |
        |----------------------------------|
        | 004      0      1   Did not fail |
        | 004      0      2   Did not fail |
        | 004      0      3   Did not fail |
        +----------------------------------+
      
      .

      Comment


      • #4
        Thank you very much!

        Comment


        • #5
          Originally posted by William Lisowski View Post
          Code:
          . by id (year), sort: egen primary = max(fail==1 & inrange(year,1,5))
          
          . by id (year), sort: egen middle = max(fail==1 & inrange(year,6,8))
          
          . by id (year), sort: egen high = max(fail==1 & inrange(year,9,12))
          
          .
          . list, sepby(id) noobs
          
          +---------------------------------------------+
          | id fail year primary middle high |
          |---------------------------------------------|
          | 001 0 1 1 0 0 |
          | 001 0 2 1 0 0 |
          | 001 0 3 1 0 0 |
          | 001 1 4 1 0 0 |
          |---------------------------------------------|
          | 002 0 1 0 1 0 |
          | 002 0 2 0 1 0 |
          | 002 0 3 0 1 0 |
          | 002 0 4 0 1 0 |
          | 002 0 5 0 1 0 |
          | 002 0 6 0 1 0 |
          | 002 0 7 0 1 0 |
          | 002 1 8 0 1 0 |
          |---------------------------------------------|
          | 003 0 1 0 0 1 |
          | 003 0 2 0 0 1 |
          | 003 0 3 0 0 1 |
          | 003 0 4 0 0 1 |
          | 003 0 5 0 0 1 |
          | 003 0 6 0 0 1 |
          | 003 0 7 0 0 1 |
          | 003 0 8 0 0 1 |
          | 003 0 9 0 0 1 |
          | 003 0 10 0 0 1 |
          | 003 0 11 0 0 1 |
          | 003 1 12 0 0 1 |
          |---------------------------------------------|
          | 004 0 1 0 0 0 |
          | 004 0 2 0 0 0 |
          | 004 0 3 0 0 0 |
          +---------------------------------------------+
          I still want to achieve the same goal as above. However, this time the data has missing values, and I don't know how to handle this issue.
          I think the rule to follow is the same as above, the only difference is we need to consider the missing value.


          clear
          input str10 id byte state byte year byte gr
          001 0 1 0
          001 0 2 1
          001 0 3 3
          001 1 4 2
          002 0 1 0
          002 0 2 1
          002 0 3 2
          002 0 4 4
          002 1 5 3
          002 0 6 6
          002 1 7 5
          002 1 8 6
          003 0 1 0
          003 0 2 1
          003 0 3 2
          003 0 4 3
          003 0 5 4
          003 . 6 .
          003 0 7 7
          003 1 8 6
          003 0 9 8
          003 . 10 .
          003 0 11 10
          003 0 12 11
          004 0 1 0
          004 . 2 .
          004 0 3 2
          end
          Thank you for your Stata code!
          by the way, gr is grade_year.
          Last edited by smith Jason; 28 Jul 2022, 22:55.

          Comment


          • #6
            Post #5 was later reposted as a new topic with a better explanation of what is wanted at

            https://www.statalist.org/forums/for...rades-in-stata

            Comment


            • #7
              Originally posted by William Lisowski View Post
              Post #5 was later reposted as a new topic with a better explanation of what is wanted at

              https://www.statalist.org/forums/for...rades-in-stata
              Thank you! However, his answer is still not what I want due to the missing value issues.
              Could you please help me?

              Comment

              Working...
              X