How can I Created New Variables within Each ID When Variable(s) Satisfy Some Conditions in Stata?

smith Jason

Join Date: Sep 2020

Posts: 380
#1

How can I Created New Variables within Each ID When Variable(s) Satisfy Some Conditions in Stata?

17 Jul 2022, 14:09

Hi, I have a small dataset for the purpose of demonstration below,
clear
input str10 id byte fail byte year
001 0 1
001 0 2
001 0 3
001 1 4
002 0 1
002 0 2
002 0 3
002 0 4
002 0 5
002 0 6
002 0 7
002 1 8
003 0 1
003 0 2
003 0 3
003 0 4
003 0 5
003 0 6
003 0 7
003 0 8
003 0 9
003 0 10
003 0 11
003 1 12
004 0 1
004 0 2
004 0 3
end
I want to create new variables called "primary", "middle", and "high", respectively based on the following rules in Stata,
1) Within each id, if fail==1 and year<=5, then all variables of primary==1, otherwise primary==0
2) Within each id, if fail==1 and year ranged from 6 to 8, then all variables of middle==1, otherwise middle==0
3) Within each id, if fail==1 and year ranged from 9 to 12, then all variables of high==1, otherwise high==0

Thank you for your code!

Last edited by smith Jason; 17 Jul 2022, 14:13.
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

17 Jul 2022, 15:00

Code:

. by id (year), sort: egen primary = max(fail==1 & inrange(year,1,5))

. by id (year), sort: egen middle  = max(fail==1 & inrange(year,6,8))

. by id (year), sort: egen high    = max(fail==1 & inrange(year,9,12))

. 
. list, sepby(id) noobs

  +---------------------------------------------+
  |  id   fail   year   primary   middle   high |
  |---------------------------------------------|
  | 001      0      1         1        0      0 |
  | 001      0      2         1        0      0 |
  | 001      0      3         1        0      0 |
  | 001      1      4         1        0      0 |
  |---------------------------------------------|
  | 002      0      1         0        1      0 |
  | 002      0      2         0        1      0 |
  | 002      0      3         0        1      0 |
  | 002      0      4         0        1      0 |
  | 002      0      5         0        1      0 |
  | 002      0      6         0        1      0 |
  | 002      0      7         0        1      0 |
  | 002      1      8         0        1      0 |
  |---------------------------------------------|
  | 003      0      1         0        0      1 |
  | 003      0      2         0        0      1 |
  | 003      0      3         0        0      1 |
  | 003      0      4         0        0      1 |
  | 003      0      5         0        0      1 |
  | 003      0      6         0        0      1 |
  | 003      0      7         0        0      1 |
  | 003      0      8         0        0      1 |
  | 003      0      9         0        0      1 |
  | 003      0     10         0        0      1 |
  | 003      0     11         0        0      1 |
  | 003      1     12         0        0      1 |
  |---------------------------------------------|
  | 004      0      1         0        0      0 |
  | 004      0      2         0        0      0 |
  | 004      0      3         0        0      0 |
  +---------------------------------------------+

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

17 Jul 2022, 15:19

Were I doing this for my work, however, I would not create three indicator variables - I would create a single categorical variable, and then use Stata's factor variable notation to include indicator variables in my models.

Code:

help factor variables

Code:

. generate range = 0

. replace  range = fail + (year>=6) + (year>=9) if fail==1
(3 real changes made)

. by id (year), sort: egen when = max(range)

. drop range

. label define WHEN 0 "Did not fail" 1 "Primary" 2 "Middle"  3 "High"

. label values when WHEN

. 
. 
. list, sepby(id) noobs

  +----------------------------------+
  |  id   fail   year           when |
  |----------------------------------|
  | 001      0      1        Primary |
  | 001      0      2        Primary |
  | 001      0      3        Primary |
  | 001      1      4        Primary |
  |----------------------------------|
  | 002      0      1         Middle |
  | 002      0      2         Middle |
  | 002      0      3         Middle |
  | 002      0      4         Middle |
  | 002      0      5         Middle |
  | 002      0      6         Middle |
  | 002      0      7         Middle |
  | 002      1      8         Middle |
  |----------------------------------|
  | 003      0      1           High |
  | 003      0      2           High |
  | 003      0      3           High |
  | 003      0      4           High |
  | 003      0      5           High |
  | 003      0      6           High |
  | 003      0      7           High |
  | 003      0      8           High |
  | 003      0      9           High |
  | 003      0     10           High |
  | 003      0     11           High |
  | 003      1     12           High |
  |----------------------------------|
  | 004      0      1   Did not fail |
  | 004      0      2   Did not fail |
  | 004      0      3   Did not fail |
  +----------------------------------+

.

Comment

smith Jason

Join Date: Sep 2020

Posts: 380
#4

17 Jul 2022, 16:07

Thank you very much!
Comment

smith Jason

Join Date: Sep 2020
Posts: 380

28 Jul 2022, 22:40

Originally posted by William Lisowski View Post

Code:

. by id (year), sort: egen primary = max(fail==1 & inrange(year,1,5))

. by id (year), sort: egen middle = max(fail==1 & inrange(year,6,8))

. by id (year), sort: egen high = max(fail==1 & inrange(year,9,12))

.
. list, sepby(id) noobs

+---------------------------------------------+
| id fail year primary middle high |
|---------------------------------------------|
| 001 0 1 1 0 0 |
| 001 0 2 1 0 0 |
| 001 0 3 1 0 0 |
| 001 1 4 1 0 0 |
|---------------------------------------------|
| 002 0 1 0 1 0 |
| 002 0 2 0 1 0 |
| 002 0 3 0 1 0 |
| 002 0 4 0 1 0 |
| 002 0 5 0 1 0 |
| 002 0 6 0 1 0 |
| 002 0 7 0 1 0 |
| 002 1 8 0 1 0 |
|---------------------------------------------|
| 003 0 1 0 0 1 |
| 003 0 2 0 0 1 |
| 003 0 3 0 0 1 |
| 003 0 4 0 0 1 |
| 003 0 5 0 0 1 |
| 003 0 6 0 0 1 |
| 003 0 7 0 0 1 |
| 003 0 8 0 0 1 |
| 003 0 9 0 0 1 |
| 003 0 10 0 0 1 |
| 003 0 11 0 0 1 |
| 003 1 12 0 0 1 |
|---------------------------------------------|
| 004 0 1 0 0 0 |
| 004 0 2 0 0 0 |
| 004 0 3 0 0 0 |
+---------------------------------------------+

I still want to achieve the same goal as above. However, this time the data has missing values, and I don't know how to handle this issue.
I think the rule to follow is the same as above, the only difference is we need to consider the missing value.

clear
input str10 id byte state byte year byte gr
001 0 1 0
001 0 2 1
001 0 3 3
001 1 4 2
002 0 1 0
002 0 2 1
002 0 3 2
002 0 4 4
002 1 5 3
002 0 6 6
002 1 7 5
002 1 8 6
003 0 1 0
003 0 2 1
003 0 3 2
003 0 4 3
003 0 5 4
003 . 6 .
003 0 7 7
003 1 8 6
003 0 9 8
003 . 10 .
003 0 11 10
003 0 12 11
004 0 1 0
004 . 2 .
004 0 3 2
end
Thank you for your Stata code!
by the way, gr is grade_year.

Last edited by smith Jason; 28 Jul 2022, 22:55.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

29 Jul 2022, 06:17

Post #5 was later reposted as a new topic with a better explanation of what is wanted at

https://www.statalist.org/forums/for...rades-in-stata
Comment
smith Jason

Join Date: Sep 2020

Posts: 380
#7

29 Jul 2022, 10:32

Originally posted by William Lisowski View Post

Post #5 was later reposted as a new topic with a better explanation of what is wanted at

https://www.statalist.org/forums/for...rades-in-stata

Thank you! However, his answer is still not what I want due to the missing value issues.
Could you please help me?
Comment

Announcement