Cumulative sums with duplicates

Neha Narayan

Join Date: Oct 2018

Posts: 3
#1

Cumulative sums with duplicates

21 Mar 2019, 16:46

I am trying to create a variable based on employment data taken from a survey done over several years. Currently, I'm working with data that takes on a value of 0 if a person is unemployed and if they are employed, the data point represents what year of the survey the person is in and what number job they are in that year. To illustrate, if a person is in the 3rd year of the survey and is on their 2nd job that year, that would be represented as a value of 302 in the dataset. I'd like to create the variable such that it indicates what number job the person is in over the entire length of the survey. Ideally, it'd look like this:

Person employment job number
1 0 0
1 101 1
1 101 1
1 102 2
1 102 2
1 0 0
1 103 3
1 201 4
1 201 4
1 202 5
1 203 6

and so on.

I've tried using several forms of cumulative addition but can't seem to figure out how to manipulate the numbers that are bigger than 200 or get them to take on the values I want. Any advice is appreciated!
Tags: data

Andrew Musau

Join Date: Oct 2014
Posts: 10213

22 Mar 2019, 05:28

Code:

input float(Person employment job_number)
1 0 0 .
1 101 1
1 101 1
1 102 2
1 102 2
1 0 0 .
1 103 3
1 201 4
1 201 4
1 202 5
1 203 6
end

gen employment2= cond(employment==0, ., employment)

bys Person (employment2): gen wanted=1 if _n==1
by Person:replace wanted= cond(employment==employment[_n-1], wanted[_n-1], wanted[_n-1]+1) if _n>1 &!missing(employment2)
replace wanted=0 if missing(wanted)
assert job_number==wanted

Result:

Code:

. l, sep(11)

     +--------------------------------------------------+
     | Person   employ~t   job_nu~r   employ~2   wanted |
     |--------------------------------------------------|
  1. |      1        101          1        101        1 |
  2. |      1        101          1        101        1 |
  3. |      1        102          2        102        2 |
  4. |      1        102          2        102        2 |
  5. |      1        103          3        103        3 |
  6. |      1        201          4        201        4 |
  7. |      1        201          4        201        4 |
  8. |      1        202          5        202        5 |
  9. |      1        203          6        203        6 |
 10. |      1          0          0          .        0 |
 11. |      1          0          0          .        0 |
     +--------------------------------------------------+

Comment

Neha Narayan

Join Date: Oct 2018

Posts: 3
#3

23 Mar 2019, 19:09

Andrew,
This was very helpful, thank you! However, I need to leave the 0s from the "employment" row in their place (as in, not after all of the numbers from 100 and up) because this is a weekly status array and I'm trying to look at the change from week to week. Is there any way to implement this while leaving the "employment" data in its original arrangement?

Best,
Neha
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#4

24 Mar 2019, 04:37

You will usually have a time (sorting) variable in your data. If not, create one.

Code:

gen order=_n <CODE HERE> sort Person order
Comment

Announcement

Cumulative sums with duplicates

Comment

Comment

Comment