Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Change of occupation (job) of respondents across waves using Understanding Society (UKHLS)

    Dear all,

    I am using Understanding Society (UKHLS) from 2009 (wave 1) to 2016/7 (wave 7).

    I am interested in following nurses (jbsoc00==3211) and midwives (jbsoc00==3212) across waves; and most especially identify entries and exit of nurses and midwives across waves.
    My numerous search and attempts of writing syntax did not yield any results thus far.

    Your help would be greatly appreciated. Thank you

    Zoé

  • #2
    I know nothing about UKHLS and it is unlikely that you will get a response from anyone who does not have direct knowledge of the dataset. Therefore, you need to generalize your question, preferably including a data example. On one point, if individuals have a unique ID over successive waves of the survey, you can identify entry by the presence of observations of a particular ID in the earliest year and exit by the absence of observations in future years.

    Comment


    • #3
      Dear Andrew,
      thank you for your response. Much appreciated.

      Indeed, I want to identify entries and exit of respondents, more particularly nurses, across wave.
      The syntax I used thus far did not yield the expected results, however.
      There are multiple observations and I cannot identify entries and exit manually.
      I was hoping to get some advice on the appropriate syntax to use in Stata.
      Thank you.

      Zoe

      Comment


      • #4
        Here is a basic example. This will not capture situations where there are multiple entries and exits for the same entity (individual/ firm) but you can make this more elaborate by looking at whether an entity is present across all waves. Using the Grunfeld dataset which has been manipulated to allow multiple start years and end years, here is one approach


        Code:
        webuse grunfeld
        drop if inlist(company, 1, 4, 7, 9) & year>1940
        drop if inlist(company, 2, 5, 8, 10) & year<1950
        *company IS THE VARIABLE IDENTIFYING FIRMS. SO SPECIFY YOUR ID VARIABLE
        *time IS THE VARIABLE IDENTIFYING  WAVES 
        bys company(time): gen entry=_n==1
        bys company(time): gen exit=_n==_N
        *FIRMS PRESENT AT THE FINAL WAVE DID NOT EXIT
        qui sum time
        local end= r(max)
        replace exit=0 if time==`end'
        list company year entry exit if entry==1| exit==1, clean noobs
        All firms will have an entry date, the first observation year. Of course, this variable will be left-censored because we can only see the first time when a firm is observed but it is likely that it was in existence before we started observing. Firms with 1 observation listed below never exited (i.e., were present at the final wave). The ones with two observations have an entry and exit year.


        Code:
         list company year entry exit if entry==1| exit==1, clean noobs
        
            company   year   entry   exit  
                  1   1935       1      0  
                  1   1940       0      1  
                  2   1950       1      0  
                  3   1935       1      0  
                  4   1935       1      0  
                  4   1940       0      1  
                  5   1950       1      0  
                  6   1935       1      0  
                  7   1935       1      0  
                  7   1940       0      1  
                  8   1950       1      0  
                  9   1935       1      0  
                  9   1940       0      1  
                 10   1950       1      0

        Comment


        • #5
          Dear Andrew,
          many thanks for providing the syntax. Much appreciated.
          I was able to identify the nurses who stayed in the 7 waves (only a small number).
          I suppose the syntax can be amended if I want to identify nurses who enter in wave 2 and leave wave 6, etc?
          Many thanks for you help. Much appreciated.
          Zoé

          Comment


          • #6
            I suppose the syntax can be amended if I want to identify nurses who enter in wave 2 and leave wave 6, etc?
            If you replicated the procedure in #4, then you should be able to identify all individuals who enter and exit regardless of year. What you cannot identify are those who enter in say wave 1, exit in wave 3, re-enter in wave 5 and stay on to the end. Such individuals would be appear not to have exited across the entire period. Something like

            Code:
            browse if wave==2 & entry| wave==6 & exit
            should show such individuals who enter in wave 2 and leave wave 6 if they exist. Only those who have
            Code:
             browse if wave==1 & entry| wave==7 & exit==0
            stayed all 7 waves (but not accounting for multiple entries). Again. if you present a data example (it does not have to be your real data but one that depicts your data), it would be easier to provide a more direct solution.

            Comment


            • #7
              Dear Andrew,
              thank you for the syntax. Much appreciated.

              Here is an example of my data.
              Its shows the identifier (pidp), the survey year (wave), the occupation (jbsoc00), the age(dvage) and the economic status (jbstat).
              In a first step, I want to identify the respondents who start the survey as nurse; but finish the survey (the respondent's final year) not as nurses.
              The main problem is the fact that such respondents will have a "missing (dot)" variable in most cases. Yet they are still in the survey.
              I hope this clarifies my initial question. Many thanks.



              pidp wave jbsoc00 dvage jbstat
              612618811 1 nurses 38 in paid
              612618811 2 nurses 39 in paid
              612618811 3 nurses 40 in paid
              612618811 4 nurses 41 in paid
              612618811 5 nurses 42 in paid
              612618811 7 . 44 looking
              Last edited by Ourega-Zoe Ejebu; 09 May 2018, 05:27.

              Comment


              • #8
                jbsoc00 and jbstat are numeric variables. So "nurses", "in paid" and "looking" are all value labels. Can you give me the value (number) of each of these 3. You can do this by typing

                Code:
                tab jbsoc00
                tab jbstat
                label list
                Last edited by Andrew Musau; 09 May 2018, 06:09.

                Comment


                • #9
                  Hi Andrew,
                  sorry for the omission.
                  The numerical value for nurses is (jbsoc00==)3212
                  The numerical values for jbstat (current economic activity) are as follows:

                  1 self employed
                  2 in paid employment (full or part-time)
                  3 unemployed
                  4 retired
                  5 on maternity leave
                  6 looking after family or home
                  7 full-time student
                  8 long-term sick or disabled
                  9 on a government training scheme
                  10 unpaid worker in family business
                  11 11
                  12 doing something else

                  Thank you


                  Comment


                  • #10
                    Thanks for the data example and extra details. I will expand the example just a bit to cover some scenarios

                    Code:
                    input double (pidp wave jbsoc00 dvage jbstat)
                    612618811 1 3212 38 2
                    612618811 2 3212 39 2
                    612618811 3 3212 40 2
                    612618811 4 3212 41 2
                    612618811 5 3212 42 2
                    612618811 7 . 44 6
                    612618812 1 1111 38 2
                    612618812 2 1111 39 2
                    612618812 3 3212 40 2
                    612618812 4 3212 41 2
                    612618812 5 . 42 .
                    612618812 7 . 44 .
                    612618813 1 3212 38 2
                    612618813 2 3212 39 2
                    612618813 3 3212 40 2
                    612618813 4 3212 41 2
                    612618813 5 3212 42 2
                    612618813 7 3212 44 2
                    end
                    
                    label define jbsoc00 1111 "other" 3212 "nurse"
                    label define jbstat 2 "in paid employment (full or part-time)" 6 "looking after family or home"
                    label values jbsoc00 jbsoc00
                    label values jbstat jbstat

                    Code:
                    
                    . list, sepby(pidp)
                    
                         +-----------------------------------------------------------------------------+
                         |      pidp   wave   jbsoc00   dvage                                   jbstat |
                         |-----------------------------------------------------------------------------|
                      1. | 6.126e+08      1     nurse      38   in paid employment (full or part-time) |
                      2. | 6.126e+08      2     nurse      39   in paid employment (full or part-time) |
                      3. | 6.126e+08      3     nurse      40   in paid employment (full or part-time) |
                      4. | 6.126e+08      4     nurse      41   in paid employment (full or part-time) |
                      5. | 6.126e+08      5     nurse      42   in paid employment (full or part-time) |
                      6. | 6.126e+08      7         .      44             looking after family or home |
                         |-----------------------------------------------------------------------------|
                      7. | 6.126e+08      1     other      38   in paid employment (full or part-time) |
                      8. | 6.126e+08      2     other      39   in paid employment (full or part-time) |
                      9. | 6.126e+08      3     nurse      40   in paid employment (full or part-time) |
                     10. | 6.126e+08      4     nurse      41   in paid employment (full or part-time) |
                     11. | 6.126e+08      5         .      42                                        . |
                     12. | 6.126e+08      7         .      44                                        . |
                         |-----------------------------------------------------------------------------|
                     13. | 6.126e+08      1     nurse      38   in paid employment (full or part-time) |
                     14. | 6.126e+08      2     nurse      39   in paid employment (full or part-time) |
                     15. | 6.126e+08      3     nurse      40   in paid employment (full or part-time) |
                     16. | 6.126e+08      4     nurse      41   in paid employment (full or part-time) |
                     17. | 6.126e+08      5     nurse      42   in paid employment (full or part-time) |
                     18. | 6.126e+08      7     nurse      44   in paid employment (full or part-time) |
                         +-----------------------------------------------------------------------------+
                    In a first step, I want to identify the respondents who start the survey as nurse; but finish the survey (the respondent's final year) not as nurses.
                    Code:
                    bysort pidp (wave): gen start_as_nurse= cond(jbsoc00==3212 & _n==1, 1, 0)
                    by pidp: egen tag1= max(start_as_nurse)
                    bysort pidp (wave): gen end_as_nurse= cond(jbsoc00==3212 & _n==_N, 1, 0)
                    by pidp: egen tag2= max(end_as_nurse)
                    *START AS NURSE, END AS NURSE
                    browse if tag1& tag2
                    
                    *START AS NURSE, DO NOT END AS NURSE
                    browse if tag1& !tag2
                    Some notes on the code:

                    1. So if a respondent starts the survey in an occupation different from nursing, e.g., see pidp=612618812 in the example, they are not tagged by the variable start_as_nurse (even though they get into nursing later on(.Please clarify if this is what you had in mind).

                    2. To check that an individual was a nurse at all years, we verify that he or she is a nurse in their first and last survey years. However, we can be more strict and specify that this holds in all intermediate years. Here is one way to check: If one is a nurse across all periods, then the mean of the jbsoc00 identifier (across time) should be the same and therefore you can add the condition

                    Code:
                    bysort pidp (wave): gen start_as_nurse= cond(jbsoc00==3212 & _n==1, 1, 0)
                    by pidp: egen tag1= max(start_as_nurse)
                    bysort pidp (wave): gen end_as_nurse= cond(jbsoc00==3212 & _n==_N, 1, 0)
                    by pidp: egen tag2= max(end_as_nurse)
                    by pidp: egen tag3= mean(jbsoc00)
                    *START AS NURSE, END AS NURSE VERIFYING ALL YEARS
                    browse if tag1& tag2 & tag3==3212
                    Make sure that you do not run into precision issues here. My variables are stored as double which is fine. If yours are floats, you can have

                    Code:
                    browse if tag1& tag2 & tag3==float(3212)

                    3. To check that one is still active in the survey, you need to check if they are providing responses for other questions (variables) even if you have missing values for your variable of interest. This may be confusing because a respondent who has all missing values for all variables is equivalent to one who is not in the survey. In your case, we can rest on one variable, i.e., jbstat meaning that if we have missing values for jbsoc00 in one year but a non-missing value for jbstat in a subsequent year, then the respondent is active in the survey. Therefore, to check whether a respondent started out as a nurse but ended up in a different occupation while still being active in the survey, using this job status variable, you can add

                    Code:
                    bysort pidp (wave): gen start_as_nurse= cond(jbsoc00==3212 & _n==1, 1, 0)
                    by pidp: egen tag1= max(start_as_nurse)
                    bysort pidp (wave): gen not_nurse_but_active= cond(jbsoc00[_n-1]==3212 &jbsoc00[_n]!=3212& jbstat!=., 1, 0)
                    by pidp: egen tag4= max(not_nurse_but_active)
                    *START AS NURSE, END NOT AS NURSE BUT ACTIVE IN SURVEY
                    browse if tag1& tag4
                    Note that this is true for respondent with pidp=612618811 in my example.
                    Last edited by Andrew Musau; 09 May 2018, 07:57.

                    Comment


                    • #11
                      Dear Andrew,
                      thank you very much for the syntax provided.
                      I wouldn't have been able to find it myself.
                      This is exactly what I was looking for.
                      Very much appreciated. And sorry for such a late reply.

                      Comment

                      Working...
                      X