Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cumulative sums with duplicates

    I am trying to create a variable based on employment data taken from a survey done over several years. Currently, I'm working with data that takes on a value of 0 if a person is unemployed and if they are employed, the data point represents what year of the survey the person is in and what number job they are in that year. To illustrate, if a person is in the 3rd year of the survey and is on their 2nd job that year, that would be represented as a value of 302 in the dataset. I'd like to create the variable such that it indicates what number job the person is in over the entire length of the survey. Ideally, it'd look like this:

    Person employment job number
    1 0 0
    1 101 1
    1 101 1
    1 102 2
    1 102 2
    1 0 0
    1 103 3
    1 201 4
    1 201 4
    1 202 5
    1 203 6

    and so on.

    I've tried using several forms of cumulative addition but can't seem to figure out how to manipulate the numbers that are bigger than 200 or get them to take on the values I want. Any advice is appreciated!


  • #2
    Code:
    input float(Person employment job_number)
    1 0 0 .
    1 101 1
    1 101 1
    1 102 2
    1 102 2
    1 0 0 .
    1 103 3
    1 201 4
    1 201 4
    1 202 5
    1 203 6
    end
    
    gen employment2= cond(employment==0, ., employment)
    
    bys Person (employment2): gen wanted=1 if _n==1
    by Person:replace wanted= cond(employment==employment[_n-1], wanted[_n-1], wanted[_n-1]+1) if _n>1 &!missing(employment2)
    replace wanted=0 if missing(wanted)
    assert job_number==wanted

    Result:

    Code:
    . l, sep(11)
    
         +--------------------------------------------------+
         | Person   employ~t   job_nu~r   employ~2   wanted |
         |--------------------------------------------------|
      1. |      1        101          1        101        1 |
      2. |      1        101          1        101        1 |
      3. |      1        102          2        102        2 |
      4. |      1        102          2        102        2 |
      5. |      1        103          3        103        3 |
      6. |      1        201          4        201        4 |
      7. |      1        201          4        201        4 |
      8. |      1        202          5        202        5 |
      9. |      1        203          6        203        6 |
     10. |      1          0          0          .        0 |
     11. |      1          0          0          .        0 |
         +--------------------------------------------------+

    Comment


    • #3
      Andrew,
      This was very helpful, thank you! However, I need to leave the 0s from the "employment" row in their place (as in, not after all of the numbers from 100 and up) because this is a weekly status array and I'm trying to look at the change from week to week. Is there any way to implement this while leaving the "employment" data in its original arrangement?

      Best,
      Neha

      Comment


      • #4
        You will usually have a time (sorting) variable in your data. If not, create one.

        Code:
        gen order=_n
        <CODE HERE>
        sort Person order

        Comment

        Working...
        X