Problems combining multiple responses according to age ranges

Jessica Smith

Join Date: Feb 2019

Posts: 13
#1

Problems combining multiple responses according to age ranges

14 Feb 2019, 15:28

Hi everyone ,
I have difficulties to find the right command with conditions that make it possible to combine multiple responses of people in a dataset.

They were interviewed with the aim to find out at which age between 15 and 19 years* they did attend school and had a part time job at the same time. Therefore, only the observations are relevant, that show overlapping age ranges. This is the case with respondent 1359 and 1834.

Now I think I need a new Variable that can detect whether the part time work is within the school age range or not. People show up to 4 different episodes of occupation, meaning that some persnr are listed up to 4 times in the dataset.

These are my variables:

persnr ---spelltyp ------------agebegin------ageend
1359 -----school----------------15----------------17
1359 -----part time work------16----------------16
1679------part time work------15----------------16
1679----- school-----------------17---------------19
1834------school-----------------15---------------15
1834------part time work-------16---------------18
1834------school-----------------18---------------19

*I already excluded all other observations that are not between 15 and 19 years.

Do you have an idea for the right command?

Thanks a lot !

Last edited by Jessica Smith; 14 Feb 2019, 15:33.
Tags: None
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#2

15 Feb 2019, 01:32

The following identifies the rows which overlap and do not have the same type. The variable overlap takes on the value 1 if the row overlaps with the preceding.

Code:

bysort persnr (agebegin): gen overlap=_n>1 & ageend[_n-1]>=agebegin & spelltyp[_n-1]~=spelltyp

If you want only the person id, you may then use -keep- and -duplicates drop-.

This fails if there is a pattern school1-school2-work with school1 (but not school2) overlapping with work, or likewise work1-work2-school with work1 (but not work2) overlapping with school. If this happens, it's possible to adapt the code (create first a fake ageend which is the maximum ageend of consecutive identical spelltyp):

Code:

gen fake=. bysort persnr (agebegin): replace fake=cond(_n==1 | ageend>fake[_n-1] | spelltyp~=spelltyp[_n-1],ageend,fake[_n-1]) bysort persnr (agebegin): gen overlap=_n>1 & fake[_n-1]>=agebegin & spelltyp[_n-1]~=spelltyp

But then the overlap variable does not identify the two consecutive rows that overlap (as they don't necessarily overlap), it only identifies one of the overlapping rows. To achieve this, simply drop the rows with fake~=ageend.

Hope this helps

Last edited by Jean-Claude Arbaut; 15 Feb 2019, 01:41.
Comment
Jessica Smith

Join Date: Feb 2019

Posts: 13
#3

16 Feb 2019, 13:18

Hi thanks a lot for the advice! This really helped me already.

I think the code is correct, however I found one case where the variable overlap should equal to 1 , but it is 0.

The majority of my data consists of people who responded with 2 periods between 15-19 years like it is the case with respondent 1359. He/She did school and part time work within this period.

Nevertheless, there are three cases in my remaining data in which people even had 4 different periods:

persnr---------spelltyp----------------begin--------------end------------overlap
3409-----------school-------------------15-----------------15----------------0
3409-----------school-------------------17-----------------19----------------0
3409-----------part time work---------17-----------------17----------------1
3409-----------part time work---------19-----------------19----------------0

The last bottom zero of overlap should be equal to one as it overlaps with the second school period.

The other two of the three cases however make sense and are correct considering the overlap.

Does this only case is wrong due to the order school and part time work appear in the data?

Many thanks!
Comment
Jessica Smith

Join Date: Feb 2019

Posts: 13
#4

20 Feb 2019, 01:17

Hello Jean-Claude,

I looked through the data, and the problem actually appeared more than once:

persnr---------spelltyp----------------begin--------------end------------overlap
7893-----------school-------------------16-----------------19----------------0
7893-----------part time work---------16-----------------16----------------1
7893-----------part time work---------18-----------------19----------------0

Here again the same problem mantioned in the last post:

persnr---------spelltyp----------------begin--------------end------------overlap
3409-----------school-------------------15-----------------15----------------0
3409-----------school-------------------17-----------------19----------------0
3409-----------part time work---------17-----------------17----------------1
3409-----------part time work---------19-----------------19----------------0

This respondent 7893 did do part time work while in school at the age of 16, 18 and 19. However, you can see that the overlap variable does not recognize it when he/she was 18-19 Years old.
The red 0 should be 1.

So there must be some extension of the command you suggested, because it won't recognize all the periods that overlap.

Do you know how to modfiy the command accrodingly?

Thanks a lot !
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10285

20 Feb 2019, 05:35

Please read the FAQs and make sure that you use dataex to present data examples in the future.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float persnr str29 spelltyp float(agebegin ageend)
1359 "school"         15 17
1359 "part time work" 16 16
1679 "part time work" 15 16
1679 "school"         17 19
1834 "school"         15 15
1834 "part time work" 16 18
1834 "school"         18 19
3409 "school"         15 15
3409 "school"         17 19
3409 "part time work" 17 17
3409 "part time work" 19 19
end

gen duration= ageend- agebegin + 1
qui sum duration
forval i= 1/`r(max)'{
gen f`i'=cond(duration>= `i', agebegin-1+`i', .)
}

reshape long f, i(persnr spelltyp agebegin ageend duration)
sort persnr f spelltyp
gen o= f[_n]==f[_n+1] & spelltyp[_n] != spelltyp[_n+1] & !missing(f)
drop if missing(f)
reshape wide f,  i(persnr spelltyp agebegin ageend duration o) j(_j)
duplicates tag persnr spelltyp agebegin ageend duration, gen(dup)
bys persnr spelltyp agebegin ageend duration: egen overlap= max(o)
bys persnr spelltyp agebegin ageend duration: drop if dup & _n>1
drop duration dup o f*

Result:

Code:

. l, sepby(persnr)

     +-------------------------------------------------------+
     | persnr         spelltyp   agebegin   ageend   overlap |
     |-------------------------------------------------------|
  1. |   1359   part time work         16       16         1 |
  2. |   1359           school         15       17         0 |
     |-------------------------------------------------------|
  3. |   1679   part time work         15       16         0 |
  4. |   1679           school         17       19         0 |
     |-------------------------------------------------------|
  5. |   1834   part time work         16       18         1 |
  6. |   1834           school         15       15         0 |
  7. |   1834           school         18       19         0 |
     |-------------------------------------------------------|
  8. |   3409   part time work         17       17         1 |
  9. |   3409   part time work         19       19         1 |
 10. |   3409           school         15       15         0 |
 11. |   3409           school         17       19         0 |
     +-------------------------------------------------------+

.

Last edited by Andrew Musau; 20 Feb 2019, 05:52.

Comment

Jessica Smith

Join Date: Feb 2019

Posts: 13
#6

21 Feb 2019, 12:14

Hi Andrew,

Thanks for the help! This was exactly what I was looking for.
Comment
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#7

21 Feb 2019, 16:24

Jessica Smith

I'm glad you finally got a good answer. My approach was not working very well: it can detect when a person has at least two overlapping periods, but it's not supposed to detect all overlapping pairs. And amending this didn't look obvious.

Best regards,

Jean-Claude Arbaut
Comment
Jessica Smith

Join Date: Feb 2019

Posts: 13
#8

22 Feb 2019, 03:39

Hi again. Thanks to the both of you!
The only thing missing now is a code that identifies the observations like the one from person nr 1679 who made 2 responses and deletes them.

persnr------- spelltyp------- agebegin-----ageend--------overlap
1679---- part time work---------15------------16----------------0
1679-------------school-----------17------------19----------------0

I need to drop those observations as the respondents did not work while in school at any time. I usually thought this would be easy, but my codes won't work.

My problem is that I don't know how to tell Stata that in case a person has two different spell types in its answer and they both show overlap=0 they need to be dropped. Here the difficulty is that Stata doesn’t seem to recognize the two following answers belong to one and the same respondent.

Any advice?

Thanks again!

Last edited by Jessica Smith; 22 Feb 2019, 03:57.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10285

22 Feb 2019, 05:32

Code:

bys persnr: egen worked= max(spelltyp=="part time work")
bys persnr: egen schooled= max(spelltyp=="school")
bys persnr: egen todrop= max(overlap)
drop if todrop==0 & worked & schooled

Comment

Jessica Smith

Join Date: Feb 2019

Posts: 13
#10

22 Feb 2019, 05:53

Thanks for your code! I got it - it was my mistake! "school" and "part time work" actually have the value 5 and 1 in my data. I entered this instead of the text.

Last edited by Jessica Smith; 22 Feb 2019, 06:00.
Comment

Announcement

Problems combining multiple responses according to age ranges

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment