Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a variable that identifies and calculates overlapping time periods

    Hi everyone,

    I am currently working on data that shows if respondents of a survey worked part time while they were in school between the age of 15 and 19 years. With some help I was able to create the variable that shows if a person did both part time and school work which is called overlap:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int ID str10 occupation byte(agebegin ageend overlap)
    1546 "school"     16 19 1
    1546 "part time " 16 16 0
    1546 "part time " 18 18 0
    1672 "school "    15 19 1
    1672 "part time " 19 19 0
    1733 "school"     16 19 1
    1733 "part time " 16 16 0
    1733 "part time " 18 19 0
    1989 "school"     15 17 1
    1989 "school"     19 19 1
    1989 "part time " 17 17 0
    1989 "part time " 19 19 0
    1368 "school"     15 19 1
    1368 "part time " 16 16 0
    1368 "part time " 18 19 0
    1121 "school"     15 16 1
    1121 "school"     18 18 1
    1121 "part time " 16 18 0
    end
    This is just an extract of many cases. You can see that overlap==1 if there is an overlap of the period in which a person did work part time while schooling. Now I want to create a variable, that can calculate the overlapping period in years. 90% of my data consists of cases like ID no 1672 where people answered that they worked once next to schooling - so the new variable "overlapy" should show the number 1 here. In the case of ID no 1733, the person worked at the age of 16,18 and 19 while in school. So "overlapy" should show the number 3. These two examples together with ID 1546 and 1368 represent nearly all of my data. Case 1121 and 1989 are exceptions and only exist once.

    Is it somehow possible to create overlapy which should only appear if overlap==1. Later I would like to have one line per ID only if overlap ==1 and therefore overlapy >=1 shows how many years of part time work a person did at schooling age.

    Any idea how to approach this?

    Thanks a lot!





  • #2
    You can do this while creating the "overlap" variable.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int ID str10 occupation byte(agebegin ageend overlap)
    1546 "school"     16 19 1
    1546 "part time " 16 16 0
    1546 "part time " 18 18 0
    1672 "school "    15 19 1
    1672 "part time " 19 19 0
    1733 "school"     16 19 1
    1733 "part time " 16 16 0
    1733 "part time " 18 19 0
    1989 "school"     15 17 1
    1989 "school"     19 19 1
    1989 "part time " 17 17 0
    1989 "part time " 19 19 0
    1368 "school"     15 19 1
    1368 "part time " 16 16 0
    1368 "part time " 18 19 0
    1121 "school"     15 16 1
    1121 "school"     18 18 1
    1121 "part time " 16 18 0
    end
    
    gen duration= ageend- agebegin + 1
    qui sum duration
    forval i= 1/`r(max)'{
    gen f`i'=cond(duration>= `i', agebegin-1+`i', .)
    }
    replace occupation = subinstr(occupation," ","",.)
    replace occupation="part time" if occupation=="parttime"
    reshape long f, i(ID occupation agebegin ageend overlap duration)
    drop if missing(f)
    preserve
    keep ID occupation f
    drop if occupation=="school"
    replace occupation= "school"
    tempfile tomerge
    save `tomerge'
    restore
    merge 1:1 ID occupation f using `tomerge'
    gen match= _merge==3
    drop _merge
    bys ID occupation agebegin ageend overlap: egen count=total(match)
    duplicates tag ID occupation agebegin ageend overlap duration, gen(dup)
    bys ID occupation agebegin ageend overlap duration: egen overlap2= max(overlap)
    bys ID occupation agebegin ageend overlap duration: drop if dup& _n>1
    drop if missing(_j)
    drop _j f match dup duration overlap
    rename overlap2 overlap
    list, sepby(ID)
    Result:

    Code:
    . list, sepby(ID)
    
         +--------------------------------------------------------+
         |   ID   occupat~n   agebegin   ageend   count   overlap |
         |--------------------------------------------------------|
      1. | 1121   part time         16       18       0         0 |
      2. | 1121      school         15       16       1         1 |
      3. | 1121      school         18       18       1         1 |
         |--------------------------------------------------------|
      4. | 1368   part time         16       16       0         0 |
      5. | 1368   part time         18       19       0         0 |
      6. | 1368      school         15       19       3         1 |
         |--------------------------------------------------------|
      7. | 1546   part time         16       16       0         0 |
      8. | 1546   part time         18       18       0         0 |
      9. | 1546      school         16       19       2         1 |
         |--------------------------------------------------------|
     10. | 1672   part time         19       19       0         0 |
     11. | 1672      school         15       19       1         1 |
         |--------------------------------------------------------|
     12. | 1733   part time         16       16       0         0 |
     13. | 1733   part time         18       19       0         0 |
     14. | 1733      school         16       19       3         1 |
         |--------------------------------------------------------|
     15. | 1989   part time         17       17       0         0 |
     16. | 1989   part time         19       19       0         0 |
     17. | 1989      school         15       17       1         1 |
     18. | 1989      school         19       19       1         1 |
         +--------------------------------------------------------+

    Comment


    • #3
      See also https://www.stata-journal.com/sjpdf....iclenum=dm0068

      dm0068 is thus revealed as an otherwise unpredictable search term for mentions in several threads on Statalist.

      Comment


      • #4
        Hi,

        The code worked out well after modifying a few things.
        Thanks a lot!

        I did not really understand the sense of these two commands in the code though:
        replace occupation = subinstr(occupation," ","",.) replace occupation="part time" if occupation=="parttime"
        Also does the paper help to understand general problems like these! Great advice!


        Comment


        • #5
          I cited the paper because I thought it might be helpful. Sorry, but I don't have time to try things on your data, but it's a short paper. .

          Comment


          • #6
            I did not really understand the sense of these two commands in the code though:
            replace occupation = subinstr(occupation," ","",.) replace occupation="part time" if occupation=="parttime"
            The command in red eliminates spaces in the entries of your variable occupation. You have inconsistent spacing, for example

            Code:
            1546 "school" 16 19 1
            1672 "school " 15 19 1
            The second command introduces a space between part and time. The following line would not produce the desired result with inconsistent spacing

            Code:
            drop if occupation=="school"

            Comment


            • #7
              Thanks Andrew Musau ! I see! I created my example manually with the help of dataex and by accident included these spaces - I was lucky to not have them in my original data But great, then I understood the meaning of the code correctly.

              Comment

              Working...
              X