Change of occupation (job) of respondents across waves using Understanding Society (UKHLS)

Ourega-Zoe Ejebu

Join Date: May 2014

Posts: 38
#1

Change of occupation (job) of respondents across waves using Understanding Society (UKHLS)

26 Apr 2018, 01:42

Dear all,

I am using Understanding Society (UKHLS) from 2009 (wave 1) to 2016/7 (wave 7).

I am interested in following nurses (jbsoc00==3211) and midwives (jbsoc00==3212) across waves; and most especially identify entries and exit of nurses and midwives across waves.
My numerous search and attempts of writing syntax did not yield any results thus far.

Your help would be greatly appreciated. Thank you

Zoé
Tags: None

1 like
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

26 Apr 2018, 06:06

I know nothing about UKHLS and it is unlikely that you will get a response from anyone who does not have direct knowledge of the dataset. Therefore, you need to generalize your question, preferably including a data example. On one point, if individuals have a unique ID over successive waves of the survey, you can identify entry by the presence of observations of a particular ID in the earliest year and exit by the absence of observations in future years.
Comment
Ourega-Zoe Ejebu

Join Date: May 2014

Posts: 38
#3

30 Apr 2018, 07:20

Dear Andrew,
thank you for your response. Much appreciated.

Indeed, I want to identify entries and exit of respondents, more particularly nurses, across wave.
The syntax I used thus far did not yield the expected results, however.
There are multiple observations and I cannot identify entries and exit manually.
I was hoping to get some advice on the appropriate syntax to use in Stata.
Thank you.

Zoe
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

30 Apr 2018, 09:39

Here is a basic example. This will not capture situations where there are multiple entries and exits for the same entity (individual/ firm) but you can make this more elaborate by looking at whether an entity is present across all waves. Using the Grunfeld dataset which has been manipulated to allow multiple start years and end years, here is one approach

Code:

webuse grunfeld
drop if inlist(company, 1, 4, 7, 9) & year>1940
drop if inlist(company, 2, 5, 8, 10) & year<1950
*company IS THE VARIABLE IDENTIFYING FIRMS. SO SPECIFY YOUR ID VARIABLE
*time IS THE VARIABLE IDENTIFYING  WAVES 
bys company(time): gen entry=_n==1
bys company(time): gen exit=_n==_N
*FIRMS PRESENT AT THE FINAL WAVE DID NOT EXIT
qui sum time
local end= r(max)
replace exit=0 if time==`end'
list company year entry exit if entry==1| exit==1, clean noobs

All firms will have an entry date, the first observation year. Of course, this variable will be left-censored because we can only see the first time when a firm is observed but it is likely that it was in existence before we started observing. Firms with 1 observation listed below never exited (i.e., were present at the final wave). The ones with two observations have an entry and exit year.

Code:

 list company year entry exit if entry==1| exit==1, clean noobs

    company   year   entry   exit  
          1   1935       1      0  
          1   1940       0      1  
          2   1950       1      0  
          3   1935       1      0  
          4   1935       1      0  
          4   1940       0      1  
          5   1950       1      0  
          6   1935       1      0  
          7   1935       1      0  
          7   1940       0      1  
          8   1950       1      0  
          9   1935       1      0  
          9   1940       0      1  
         10   1950       1      0

Comment

Ourega-Zoe Ejebu

Join Date: May 2014

Posts: 38
#5

09 May 2018, 02:25

Dear Andrew,
many thanks for providing the syntax. Much appreciated.
I was able to identify the nurses who stayed in the 7 waves (only a small number).
I suppose the syntax can be amended if I want to identify nurses who enter in wave 2 and leave wave 6, etc?
Many thanks for you help. Much appreciated.
Zoé
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#6

09 May 2018, 03:42

I suppose the syntax can be amended if I want to identify nurses who enter in wave 2 and leave wave 6, etc?

If you replicated the procedure in #4, then you should be able to identify all individuals who enter and exit regardless of year. What you cannot identify are those who enter in say wave 1, exit in wave 3, re-enter in wave 5 and stay on to the end. Such individuals would be appear not to have exited across the entire period. Something like

Code:

browse if wave==2 & entry| wave==6 & exit

should show such individuals who enter in wave 2 and leave wave 6 if they exist. Only those who have

Code:

browse if wave==1 & entry| wave==7 & exit==0

stayed all 7 waves (but not accounting for multiple entries). Again. if you present a data example (it does not have to be your real data but one that depicts your data), it would be easier to provide a more direct solution.
Comment

Ourega-Zoe Ejebu

Join Date: May 2014
Posts: 38

09 May 2018, 05:23

Dear Andrew,
thank you for the syntax. Much appreciated.

Here is an example of my data.
Its shows the identifier (pidp), the survey year (wave), the occupation (jbsoc00), the age(dvage) and the economic status (jbstat).
In a first step, I want to identify the respondents who start the survey as nurse; but finish the survey (the respondent's final year) not as nurses.
The main problem is the fact that such respondents will have a "missing (dot)" variable in most cases. Yet they are still in the survey.
I hope this clarifies my initial question. Many thanks.

pidp	wave	jbsoc00	dvage	jbstat
612618811	1	nurses	38	in paid
612618811	2	nurses	39	in paid
612618811	3	nurses	40	in paid
612618811	4	nurses	41	in paid
612618811	5	nurses	42	in paid
612618811	7	.	44	looking

Last edited by Ourega-Zoe Ejebu; 09 May 2018, 05:27.

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10190
#8

09 May 2018, 06:06

jbsoc00 and jbstat are numeric variables. So "nurses", "in paid" and "looking" are all value labels. Can you give me the value (number) of each of these 3. You can do this by typing

Code:

tab jbsoc00 tab jbstat label list

Last edited by Andrew Musau; 09 May 2018, 06:09.
Comment
Ourega-Zoe Ejebu

Join Date: May 2014

Posts: 38
#9

09 May 2018, 06:22

Hi Andrew,
sorry for the omission.
The numerical value for nurses is (jbsoc00==)3212
The numerical values for jbstat (current economic activity) are as follows:

1 self employed
2 in paid employment (full or part-time)
3 unemployed
4 retired
5 on maternity leave
6 looking after family or home
7 full-time student
8 long-term sick or disabled
9 on a government training scheme
10 unpaid worker in family business
11 11
12 doing something else

Thank you
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

#10

09 May 2018, 07:45

Thanks for the data example and extra details. I will expand the example just a bit to cover some scenarios

Code:

input double (pidp wave jbsoc00 dvage jbstat)
612618811 1 3212 38 2
612618811 2 3212 39 2
612618811 3 3212 40 2
612618811 4 3212 41 2
612618811 5 3212 42 2
612618811 7 . 44 6
612618812 1 1111 38 2
612618812 2 1111 39 2
612618812 3 3212 40 2
612618812 4 3212 41 2
612618812 5 . 42 .
612618812 7 . 44 .
612618813 1 3212 38 2
612618813 2 3212 39 2
612618813 3 3212 40 2
612618813 4 3212 41 2
612618813 5 3212 42 2
612618813 7 3212 44 2
end

label define jbsoc00 1111 "other" 3212 "nurse"
label define jbstat 2 "in paid employment (full or part-time)" 6 "looking after family or home"
label values jbsoc00 jbsoc00
label values jbstat jbstat

Code:


. list, sepby(pidp)

     +-----------------------------------------------------------------------------+
     |      pidp   wave   jbsoc00   dvage                                   jbstat |
     |-----------------------------------------------------------------------------|
  1. | 6.126e+08      1     nurse      38   in paid employment (full or part-time) |
  2. | 6.126e+08      2     nurse      39   in paid employment (full or part-time) |
  3. | 6.126e+08      3     nurse      40   in paid employment (full or part-time) |
  4. | 6.126e+08      4     nurse      41   in paid employment (full or part-time) |
  5. | 6.126e+08      5     nurse      42   in paid employment (full or part-time) |
  6. | 6.126e+08      7         .      44             looking after family or home |
     |-----------------------------------------------------------------------------|
  7. | 6.126e+08      1     other      38   in paid employment (full or part-time) |
  8. | 6.126e+08      2     other      39   in paid employment (full or part-time) |
  9. | 6.126e+08      3     nurse      40   in paid employment (full or part-time) |
 10. | 6.126e+08      4     nurse      41   in paid employment (full or part-time) |
 11. | 6.126e+08      5         .      42                                        . |
 12. | 6.126e+08      7         .      44                                        . |
     |-----------------------------------------------------------------------------|
 13. | 6.126e+08      1     nurse      38   in paid employment (full or part-time) |
 14. | 6.126e+08      2     nurse      39   in paid employment (full or part-time) |
 15. | 6.126e+08      3     nurse      40   in paid employment (full or part-time) |
 16. | 6.126e+08      4     nurse      41   in paid employment (full or part-time) |
 17. | 6.126e+08      5     nurse      42   in paid employment (full or part-time) |
 18. | 6.126e+08      7     nurse      44   in paid employment (full or part-time) |
     +-----------------------------------------------------------------------------+

In a first step, I want to identify the respondents who start the survey as nurse; but finish the survey (the respondent's final year) not as nurses.

Code:

bysort pidp (wave): gen start_as_nurse= cond(jbsoc00==3212 & _n==1, 1, 0)
by pidp: egen tag1= max(start_as_nurse)
bysort pidp (wave): gen end_as_nurse= cond(jbsoc00==3212 & _n==_N, 1, 0)
by pidp: egen tag2= max(end_as_nurse)
*START AS NURSE, END AS NURSE
browse if tag1& tag2

*START AS NURSE, DO NOT END AS NURSE
browse if tag1& !tag2

Some notes on the code:

1. So if a respondent starts the survey in an occupation different from nursing, e.g., see pidp=612618812 in the example, they are not tagged by the variable start_as_nurse (even though they get into nursing later on(.Please clarify if this is what you had in mind).

2. To check that an individual was a nurse at all years, we verify that he or she is a nurse in their first and last survey years. However, we can be more strict and specify that this holds in all intermediate years. Here is one way to check: If one is a nurse across all periods, then the mean of the jbsoc00 identifier (across time) should be the same and therefore you can add the condition

Code:

bysort pidp (wave): gen start_as_nurse= cond(jbsoc00==3212 & _n==1, 1, 0)
by pidp: egen tag1= max(start_as_nurse)
bysort pidp (wave): gen end_as_nurse= cond(jbsoc00==3212 & _n==_N, 1, 0)
by pidp: egen tag2= max(end_as_nurse)
by pidp: egen tag3= mean(jbsoc00)
*START AS NURSE, END AS NURSE VERIFYING ALL YEARS
browse if tag1& tag2 & tag3==3212

Make sure that you do not run into precision issues here. My variables are stored as double which is fine. If yours are floats, you can have

Code:

browse if tag1& tag2 & tag3==float(3212)

3. To check that one is still active in the survey, you need to check if they are providing responses for other questions (variables) even if you have missing values for your variable of interest. This may be confusing because a respondent who has all missing values for all variables is equivalent to one who is not in the survey. In your case, we can rest on one variable, i.e., jbstat meaning that if we have missing values for jbsoc00 in one year but a non-missing value for jbstat in a subsequent year, then the respondent is active in the survey. Therefore, to check whether a respondent started out as a nurse but ended up in a different occupation while still being active in the survey, using this job status variable, you can add

Code:

bysort pidp (wave): gen start_as_nurse= cond(jbsoc00==3212 & _n==1, 1, 0)
by pidp: egen tag1= max(start_as_nurse)
bysort pidp (wave): gen not_nurse_but_active= cond(jbsoc00[_n-1]==3212 &jbsoc00[_n]!=3212& jbstat!=., 1, 0)
by pidp: egen tag4= max(not_nurse_but_active)
*START AS NURSE, END NOT AS NURSE BUT ACTIVE IN SURVEY
browse if tag1& tag4

Note that this is true for respondent with pidp=612618811 in my example.

Last edited by Andrew Musau; 09 May 2018, 07:57.

Comment

Ourega-Zoe Ejebu

Join Date: May 2014

Posts: 38
#11

15 May 2018, 05:00

Dear Andrew,
thank you very much for the syntax provided.
I wouldn't have been able to find it myself.
This is exactly what I was looking for.
Very much appreciated. And sorry for such a late reply.
Comment

Announcement