Creating a variable that identifies and calculates overlapping time periods

Jessica Smith

Join Date: Feb 2019

Posts: 13
#1

Creating a variable that identifies and calculates overlapping time periods

24 Feb 2019, 14:14

Hi everyone,

I am currently working on data that shows if respondents of a survey worked part time while they were in school between the age of 15 and 19 years. With some help I was able to create the variable that shows if a person did both part time and school work which is called overlap:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int ID str10 occupation byte(agebegin ageend overlap) 1546 "school" 16 19 1 1546 "part time " 16 16 0 1546 "part time " 18 18 0 1672 "school " 15 19 1 1672 "part time " 19 19 0 1733 "school" 16 19 1 1733 "part time " 16 16 0 1733 "part time " 18 19 0 1989 "school" 15 17 1 1989 "school" 19 19 1 1989 "part time " 17 17 0 1989 "part time " 19 19 0 1368 "school" 15 19 1 1368 "part time " 16 16 0 1368 "part time " 18 19 0 1121 "school" 15 16 1 1121 "school" 18 18 1 1121 "part time " 16 18 0 end

This is just an extract of many cases. You can see that overlap==1 if there is an overlap of the period in which a person did work part time while schooling. Now I want to create a variable, that can calculate the overlapping period in years. 90% of my data consists of cases like ID no 1672 where people answered that they worked once next to schooling - so the new variable "overlapy" should show the number 1 here. In the case of ID no 1733, the person worked at the age of 16,18 and 19 while in school. So "overlapy" should show the number 3. These two examples together with ID 1546 and 1368 represent nearly all of my data. Case 1121 and 1989 are exceptions and only exist once.

Is it somehow possible to create overlapy which should only appear if overlap==1. Later I would like to have one line per ID only if overlap ==1 and therefore overlapy >=1 shows how many years of part time work a person did at schooling age.

Any idea how to approach this?

Thanks a lot!
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10190

24 Feb 2019, 15:37

You can do this while creating the "overlap" variable.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int ID str10 occupation byte(agebegin ageend overlap)
1546 "school"     16 19 1
1546 "part time " 16 16 0
1546 "part time " 18 18 0
1672 "school "    15 19 1
1672 "part time " 19 19 0
1733 "school"     16 19 1
1733 "part time " 16 16 0
1733 "part time " 18 19 0
1989 "school"     15 17 1
1989 "school"     19 19 1
1989 "part time " 17 17 0
1989 "part time " 19 19 0
1368 "school"     15 19 1
1368 "part time " 16 16 0
1368 "part time " 18 19 0
1121 "school"     15 16 1
1121 "school"     18 18 1
1121 "part time " 16 18 0
end

gen duration= ageend- agebegin + 1
qui sum duration
forval i= 1/`r(max)'{
gen f`i'=cond(duration>= `i', agebegin-1+`i', .)
}
replace occupation = subinstr(occupation," ","",.)
replace occupation="part time" if occupation=="parttime"
reshape long f, i(ID occupation agebegin ageend overlap duration)
drop if missing(f)
preserve
keep ID occupation f
drop if occupation=="school"
replace occupation= "school"
tempfile tomerge
save `tomerge'
restore
merge 1:1 ID occupation f using `tomerge'
gen match= _merge==3
drop _merge
bys ID occupation agebegin ageend overlap: egen count=total(match)
duplicates tag ID occupation agebegin ageend overlap duration, gen(dup)
bys ID occupation agebegin ageend overlap duration: egen overlap2= max(overlap)
bys ID occupation agebegin ageend overlap duration: drop if dup& _n>1
drop if missing(_j)
drop _j f match dup duration overlap
rename overlap2 overlap
list, sepby(ID)

Result:

Code:

. list, sepby(ID)

     +--------------------------------------------------------+
     |   ID   occupat~n   agebegin   ageend   count   overlap |
     |--------------------------------------------------------|
  1. | 1121   part time         16       18       0         0 |
  2. | 1121      school         15       16       1         1 |
  3. | 1121      school         18       18       1         1 |
     |--------------------------------------------------------|
  4. | 1368   part time         16       16       0         0 |
  5. | 1368   part time         18       19       0         0 |
  6. | 1368      school         15       19       3         1 |
     |--------------------------------------------------------|
  7. | 1546   part time         16       16       0         0 |
  8. | 1546   part time         18       18       0         0 |
  9. | 1546      school         16       19       2         1 |
     |--------------------------------------------------------|
 10. | 1672   part time         19       19       0         0 |
 11. | 1672      school         15       19       1         1 |
     |--------------------------------------------------------|
 12. | 1733   part time         16       16       0         0 |
 13. | 1733   part time         18       19       0         0 |
 14. | 1733      school         16       19       3         1 |
     |--------------------------------------------------------|
 15. | 1989   part time         17       17       0         0 |
 16. | 1989   part time         19       19       0         0 |
 17. | 1989      school         15       17       1         1 |
 18. | 1989      school         19       19       1         1 |
     +--------------------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35691
#3

25 Feb 2019, 02:06

See also https://www.stata-journal.com/sjpdf....iclenum=dm0068

dm0068 is thus revealed as an otherwise unpredictable search term for mentions in several threads on Statalist.
Comment
Jessica Smith

Join Date: Feb 2019

Posts: 13
#4

27 Feb 2019, 02:30

Hi,

The code worked out well after modifying a few things.
Thanks a lot!

I did not really understand the sense of these two commands in the code though:
replace occupation = subinstr(occupation," ","",.) replace occupation="part time" if occupation=="parttime"
Also does the paper help to understand general problems like these! Great advice!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35691
#5

27 Feb 2019, 03:10

I cited the paper because I thought it might be helpful. Sorry, but I don't have time to try things on your data, but it's a short paper. .
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#6

27 Feb 2019, 03:28

I did not really understand the sense of these two commands in the code though:
replace occupation = subinstr(occupation," ","",.) replace occupation="part time" if occupation=="parttime"

The command in red eliminates spaces in the entries of your variable occupation. You have inconsistent spacing, for example

Code:

1546 "school" 16 19 1 1672 "school " 15 19 1

The second command introduces a space between part and time. The following line would not produce the desired result with inconsistent spacing

Code:

drop if occupation=="school"
Comment
Jessica Smith

Join Date: Feb 2019

Posts: 13
#7

27 Feb 2019, 07:07

Thanks Andrew Musau ! I see! I created my example manually with the help of dataex and by accident included these spaces - I was lucky to not have them in my original data But great, then I understood the meaning of the code correctly.
Comment

Announcement

Creating a variable that identifies and calculates overlapping time periods

Comment

Comment

Comment

Comment

Comment

Comment