Generating a dummy variable for if a variable changes over time periods

Chris Meier

Join Date: Jun 2016
Posts: 9

#16

17 Jun 2016, 13:33

Thanks for the hint to create a simplified version of the dataset. This made me realise that my intention of not making the example too complicated has backfired on me.
There are 4 observations for every ID (every household): 1) industry in year1 for male, 2) industry in year2 for male, 3) industry in year1 for female, 4) industry in year2 for female

Code:

clear
input ID year industry sex
1 1 3 1
1 2 10 1
1 1 3 2
1 2 3 2
2 1 10 1
2 2 3 1
2 1 10 2
2 2 3 2
3 1 42 1
3 2 42 1
3 1 7 2
3 2 8 2
end
bysort ID (year): gen var=industry[1]==3 & industry[2]==10 if sex==1
list, sepby(ID)

     +----------------------------------+
     | ID   year   industry   sex   var |
     |----------------------------------|
  1. |  1      1          3     2     . |
  2. |  1      1          3     1     0 |
  3. |  1      2         10     1     0 |
  4. |  1      2          3     2     . |
     |----------------------------------|
  5. |  2      1         10     2     . |
  6. |  2      1         10     1     0 |
  7. |  2      2          3     1     0 |
  8. |  2      2          3     2     . |
     |----------------------------------|
  9. |  3      1         42     1     0 |
 10. |  3      1          7     2     . |
 11. |  3      2          8     2     . |
 12. |  3      2         42     1     0 |
     +----------------------------------+

I assume that I would need to sort the observations differently...? Or does this problem require a completely different approach?
Thanks a lot for your patience, I am new to Stata and barely have any coding experience (as you might have noticed).
Any suggestion on how to solve this is much appreciated! Thanks

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#17

17 Jun 2016, 15:14

I think you need:

Code:

isid ID sex year, sort by ID sex (year): gen var = industry[1] == 3 & industry[2] == 10 if sex == 1

Lest we get into further difficulties from other complications in the data, you can only depend on this sort order if ID sex and year uniquely identify the observations. That is why I put the -isid- command in there. If that is not true, then you cannot be sure which observations will sort into the first two positions of a given ID sex year group, so the operation could fail. If ID sex and year do not uniquely identify observations in your data, then the -isid- command will halt execution, and you will have to figure out whether

1. ID sex and year should uniquely identify observations, so you have a data error that you need to fix, or,
2. Some additional variable(s) need to be specified to uniquely identify observations and therefore determine a unique sort order.
Comment
Chris Meier

Join Date: Jun 2016

Posts: 9
#18

20 Jun 2016, 11:23

Thanks, Clyde!
Indeed, when I tried to run the code you suggested in #17, I got an error message saying that the variables cannot be uniquely identified.
Some observations in my dataset did not follow the desired pattern but displayed two different observations for the same year (ID1 and ID3 have incorrect observations):

Code:

clear input ID year industry sex 1 1 3 1 1 2 10 1 1 1 23 2 1 1 7 2 2 1 10 1 2 2 3 1 2 1 4 2 2 2 4 2 3 1 42 1 3 2 42 1 3 2 42 2 3 2 42 2 end

I dropped the incorrect observations using

Code:

bysort ID year: drop if _N!=2

Then the -isid- command worked perfectly.

Thanks a lot for your help, Nick and Clyde!
Comment
Molly OBrien

Join Date: Mar 2019

Posts: 1
#19

22 Mar 2019, 05:01

I have a similar problem, I would like to see if a variable changes twice over a time period (ignoring missing values). I have a variable, gender, that takes value of 1 or 2. I would like to create a variable that = 1 if the value of gender changes twice over the time period for an individual.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10482

#20

22 Mar 2019, 08:14

I have a variable, gender, that takes value of 1 or 2. I would like to create a variable that = 1 if the value of gender changes twice over the time period for an individual.

Is this really possible?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(ID gender time)
1 1 1
1 2 2
1 . 3
1 2 4
1 1 5
1 2 6
1 1 7
2 2 1
2 1 2
2 2 3
3 2 1
3 2 2
4 . 1
4 . 2
4 2 3
4 2 4
end


gen time2= cond(missing(gender), ., time)
bys ID (time2): gen change= sum(gender!= gender[_n-1]) if _n>1 & !missing(gender)
bys ID (time): egen wanted = max(change)
replace wanted= wanted>1

Result:

Code:

  
. l, sepby(ID)

     +----------------------------------------------+
     | ID   gender   time   time2   change   wanted |
     |----------------------------------------------|
  1. |  1        1      1       1        .        1 |
  2. |  1        2      2       2        1        1 |
  3. |  1        .      3       .        .        1 |
  4. |  1        2      4       4        1        1 |
  5. |  1        1      5       5        2        1 |
  6. |  1        2      6       6        3        1 |
  7. |  1        1      7       7        4        1 |
     |----------------------------------------------|
  8. |  2        2      1       1        .        1 |
  9. |  2        1      2       2        1        1 |
 10. |  2        2      3       3        2        1 |
     |----------------------------------------------|
 11. |  3        2      1       1        .        0 |
 12. |  3        2      2       2        0        0 |
     |----------------------------------------------|
 13. |  4        .      1       .        .        0 |
 14. |  4        .      2       .        .        0 |
 15. |  4        2      3       3        .        0 |
 16. |  4        2      4       4        0        0 |
     +----------------------------------------------+

Last edited by Andrew Musau; 22 Mar 2019, 08:21.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#21

22 Mar 2019, 12:13

Andrew Musau

I agree with you that in reality, a person's actual gender changing twice over the course of any study period is going to be very rare (although nowadays children and adolescents sometimes declare themselves trans-gender and then desist only a short time later).

But if Ms. O'Brien's data comes from an electronic medical records data base, my experience is that it would not be terribly uncommon for the recorded gender to change twice or more over a matter of a year. Such is the poor quality of electronic health data in many settings.
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10482
#22

22 Mar 2019, 12:48

Thanks for the context, Clyde.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment