Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reorder employment history in cross-sectional survey from oldest to most recent job

    I have a cross-sectional survey with 12 sets of variables, e.g. job1_*, job2_*, ..., job12_*, that cover the entire employment history starting with job1 being the first/oldest job and if people have had multiple jobs job2 to job12 recoding the most recent job. Variables job*_01 record the type of employment, e.g. temporary, permanent, seasonal, while job*_02 covers the occupation. I would now like to create new variables jobmin1_*, jobmin2_*, jobmin3_* that reverse the order of the employment history, so that the variables jobmin1_* is the most recent job and jobmin12* is the first/oldest job. The problem is that individuals have different number of jobs so I need to dynamically select the oldest job.

    MRE Starting point:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(job1_01 job2_01 job3_01) long(job1_02_nh job2_02_nh job3_02_nh)
    4 1 . 9211 5212    .
    1 1 . 9510 5419    .
    1 1 . 5329 5329    .
    1 1 1 7112 3142 5221
    1 3 . 2269 6130    .
    3 1 1 9520 8142 5211
    1 1 . 5212 2341    .
    1 1 1 9629 9510 5221
    1 1 . 5230 5246    .
    3 1 1 9212 9212 9211
    end
    MRE Expected end result:
    Code:
    clear
    input byte(job1_01 job2_01 job3_01) long(job1_02_nh job2_02_nh job3_02_nh) byte(jobmin1_01 jobmin2_01 jobmin3_01) long(jobmin1_02_nh jobmin2_02_nh jobmin3_02_nh)
    4 1 . 9211 5212 . 1 4 . 5212 9211 .
    1 1 . 9510 5419 . 1 1 . 5419 9510 .
    1 1 . 5329 5329 . 1 1 . 5329 5329 .
    1 1 1 7112 3142 5221 1 1 1 5221 3142 7112
    1 3 . 2269 6130 . 3 1 . 6130 2269 .
    3 1 1 9520 8142 5211 1 1 3 5211 8142 9520
    1 1 . 5212 2341 . 1 1 . 2341 5212 .
    1 1 1 9629 9510 5221 1 1 1 5221 9510 9629
    1 1 . 5230 5246 . 1 1 . 5246 5230 .
    3 1 1 9212 9212 9211 1 1 3 9211 9212 9212
    end
    My solution so far is to create a variable that captures the number of jobs people had, and then create the variables for each suffix and replace the variables with the value of the respective job variable by using the information from the n_jobs variable. However, in my solution jobmin2_01 is wrong for observation 6 (3 instead of 1) and observation 10 (3 instead of 1). My best guess is that the evaluation of `=jobcount-1' in the last two replace lines does not work properly and I have been playing around with different forms, e.g. making jobcount-2 in the last replace line, but I these just throw error messages and I can not get them to work.

    My attempt:
    Code:
    * Step 1: Capture the number of jobs people have had
    * Create the number of jobs an individual has had
    gen n_jobs = 0
    
    forval i = 1/3 {
        replace n_jobs = n_jobs + !missing(job`i'_01)
    }
    
    clonevar jobcount = n_jobs
    // Step 2: Loop Over Suffixes and Assign Values for Up to Three Jobs
    local suffixes = "01 02_nh"
    
    foreach suffix in `suffixes' {
        // Assign jobmin1_* with the most recent job
        gen jobmin1_`suffix' = .
        gen jobmin2_`suffix' = .
        gen jobmin3_`suffix' = .
            
        replace jobmin1_`suffix' = job`=jobcount'_`suffix'
        replace jobmin2_`suffix' = job`=jobcount-1'_`suffix' if n_jobs >= 2
        replace jobmin3_`suffix' = job`=jobcount-1'_`suffix' if n_jobs >= 3
    }
    
    list job1_01 job2_01 job3_01 jobmin1_01 jobmin2_01 jobmin3_01 n_jobs
    Last edited by Felix Kaysers; 15 Aug 2024, 13:49. Reason: added information about potential cause of problem
    Cheers,
    Felix
    Stata Version: MP 18.0
    OS: Windows 11

  • #2
    For your example data, the following works:
    Code:
    local njobs 3
    forvalues i = 1/3 {
        gen jobmin`i'_01 = job`=`njobs'+1-`i''_01
        gen jobmin`i'_02_nh  = job`=`njobs'+1 -`i''_02_nh
    }
    order jobmin*_01, after(job`njobs'_02_nh)
    You say you have 12 such variables instead of 3, so all you need to do is change the definition of local macro njobs to 12.

    Comment


    • #3
      Dear Clyde,

      thank you for your solution. Unfortunately running it in Stata does not return what I want. See here:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(job1_01 job2_01 job3_01) float(jobmin1_01 jobmin2_01 jobmin3_01 n_jobs jobcount)
      4 1 . . 1 4 2 2
      1 1 . . 1 1 2 2
      1 1 . . 1 1 2 2
      1 1 1 1 1 1 3 3
      1 3 . . 3 1 2 2
      3 1 1 1 1 3 3 3
      1 1 . . 1 1 2 2
      1 1 1 1 1 1 3 3
      1 1 . . 1 1 2 2
      3 1 1 1 1 3 3 3
      end
      I want that for observation 1, jobmin1_01 is not missing but set to 1 (== job02_01), jobmin2_01 is set to 4 instead of 1 (== job1_01), and jobmin3_01 is set to missing instead of 4. Your code seems to just reverse the variables, but I need to reverse them so the latest job becomes the first job variable. Because people have different number of jobs - in my example data, some have 2, some have 3 jobs, I can not just reverse them. Let me know if I have left anything unclear!

      Cheers,
      Felix
      Stata Version: MP 18.0
      OS: Windows 11

      Comment


      • #4
        I'd check out rowsort from the Stata Journal.


        SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox
        Q1/09 SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

        Comment


        • #5
          Thank you Nick, I will check it out!
          Cheers,
          Felix
          Stata Version: MP 18.0
          OS: Windows 11

          Comment


          • #6
            Re #3. Thanks for making that clear. In the expected output you showed in #1, you did not do what you say in #3. In fact, it made me wonder why, in your code there, you went through such contortions to skip over missing values in the job* variables when there was no need to do that to get the results you said you wanted. So I modeled my code on #2 on the expected output from #1.

            So, here's a simple way to do what you actually want. Note that it requires no loops, and there is no need to create variables that count the number of non-missing observations per person. In fact, you don't even have to know anything about the number of job variables. This code will work whether there are 3 or 12 or 3000.

            Code:
            gen `c(obs_t)' id = _n
            
            reshape long job@_01 job@_02_nh, i(id)
            drop if missing(job_01, job_02_nh)
            gsort id -_j
            by id: replace _j = _n
            reshape wide
            order job*_01, after(id)Re #3
            Note: the first line of code creates a unique identifying variable, which is necessary for the -reshape- commands to work properly. My guess is that in your full data set you already have such a variable. If so, there is no need to create a new one: just substitute that variable for id everywhere in the rest of the code.

            I also suggest you consider ending the process before the -reshape wide- command. The long layout of the data at that point is actually more useful for most Stata data management and analysis commands. So give some thought to where you are going with this data. There is a good chance that leaving it long will make things easier for you.

            Comment


            • #7
              Thank you for your suggested solutions everyone!
              Clyde Schechter, your code works flawlessly with my example data. I failed to realize that some of my variable sets, i.e. all variables for job 1, have missing values (MRE below). Thus running your code on my data does, unfortunately for me, not work.
              Nick Cox, I looked into rowsort. I should have made clear that some of the data is categorical, so sorting the data using rowsort does not work, but it seems that a loop over observations described in the mentioned Stata journal article is the way to go, so thank you!
              I will give looping over the observations a shot right now.

              MRE:
              Code:
              clear
              input byte(job1_01 job2_01 job3_01) long(job1_02_nh job2_02_nh job3_02_nh)
              4 1 . . 5212    .
              1 1 . . 5419    .
              1 1 . . 5329    .
              1 1 1 . 3142 5221
              1 3 . . 6130    .
              3 1 1 . 8142 5211
              1 1 . . 2341    .
              1 1 1 . 9510 5221
              1 1 . . 5246    .
              3 1 1 . 9212 9211
              end
              Cheers,
              Felix
              Stata Version: MP 18.0
              OS: Windows 11

              Comment


              • #8
                rowsort can sort numeric and string variables alike, so being categorical is in itself not material. If a sort order is defined at all, rowsort should apply.

                Comment


                • #9
                  My apologies, I have probably been imprecise again. Based on reading the help file and trying out rowsort, rowsort did not work for me because it ordered the results by the numeric value of the column. However, using rowsort does not return the correct result for observation 5 of my MRE in #7:


                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input byte(job1_01 job2_01 job3_01) long(job1_02_nh job2_02_nh job3_02_nh) float(jobmin1 jobmin2 jobmin3)
                  4 1 . . 5212    . 1 4 .
                  1 1 . . 5419    . 1 1 .
                  1 1 . . 5329    . 1 1 .
                  1 1 1 . 3142 5221 1 1 1
                  1 3 . . 6130    . 1 3 .
                  3 1 1 . 8142 5211 1 1 3
                  1 1 . . 2341    . 1 1 .
                  1 1 1 . 9510 5221 1 1 1
                  1 1 . . 5246    . 1 1 .
                  3 1 1 . 9212 9211 1 1 3
                  end
                  
                  
                  rowsort job1_01 job2_01 job3_01, gen(jobmin1 jobmin2 jobmin3)
                  For observation 5, jobmin1 should be 3 instead of 1 and jobmin2 should be 1 instead of 3. Basically people can transition from 1 in job1_01 to 3 in job2_01 to 2 in job3_01.
                  Last edited by Felix Kaysers; 19 Aug 2024, 08:22.
                  Cheers,
                  Felix
                  Stata Version: MP 18.0
                  OS: Windows 11

                  Comment


                  • #10
                    If 1 3 2 is an order, then indeed rowsort can't return that as a result.

                    Comment


                    • #11
                      Clyde Schechter, your code works flawlessly with my example data. I failed to realize that some of my variable sets, i.e. all variables for job 1, have missing values (MRE below). Thus running your code on my data does, unfortunately for me, not work.
                      That can be easily fixed:
                      Code:
                      gen `c(obs_t)' id = _n
                      
                      reshape long job@_01 job@_02_nh, i(id)
                      drop if missing(job_01)
                      gsort id -_j
                      by id: replace _j = _n
                      reshape wide
                      order job*_01, after(id)


                      Comment


                      • #12
                        I have settled on the following solution, since the patterns of missigness are more complex than anticipated. For example, for some variables of job1, some but not all rows are missing.

                        Code:
                        egen num_jobs = rownonmiss(job?_01)
                        global suffices 01 02_nh
                        foreach suffix in $suffices {
                            forval k = 1/12 {
                                gen jobmin`k'_`suffix' = .
                            }
                        }
                        
                        foreach suffix in $suffices {
                            forval j = 12(-1)1 {
                                forval k = 1/`j' {
                                    local min = `j' - `k' + 1
                        
                                    // Replace jobmin variables dynamically for each suffix
                                    replace jobmin`k'_`suffix' = job`min'_`suffix' if numjobs == `j'
                                }
                            }
                        }
                        Cheers,
                        Felix
                        Stata Version: MP 18.0
                        OS: Windows 11

                        Comment


                        • #13
                          I don't think you've correctly transcribed the code you are using in #12. For one thing, you create a variable called num_jobs. But then at the end of the second loop you try to use a (non-existent) variable numjobs. Presumably that's a typo you made transcribing here. (It's best not to retype code: use copy/paste, as it's much less error prone.)

                          But looking at your code I don't see how it would produce different results from the code I gave in #11. And with your example data, they do in fact give the same results (after fixing that typo and also changing 12 to 3 to match the example data). So I'm wondering if you encountered a situation where they produce different results. If so, I'd appreciate it if you would show it so I can figure out where my reasoning is going wrong.

                          Thanks.

                          Comment


                          • #14
                            You are correct about the typo and the need to change 12 to 3 in the forval loops in #12. I revisited your code using my data, and it works perfectly in cases where the job_01 variable is filled out, and the corresponding job_02_nh variables are also completed. However, if this isn’t the case—as in observation 3 in my MRE below—I believe your code may inadvertently delete information.

                            I can, of course, use misstable summarize for each set of job variables to manually confirm whether there is a variable that indicates whether the other related variables are filled out, and then run your code accordingly. In fact, I did this with my data and discovered that there is indeed one set of job variables where job?_01 being missing does not guarantee that the other job?_?? variables are also missing.

                            While your code is certainly the more elegant solution, my approach might be more foolproof in cases where data completeness is inconsistent. What are your thoughts?

                            MRE:
                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            input byte(job1_01 job2_01 job3_01) long(job1_02_nh job2_02_nh job3_02_nh)
                            4 1 . . 5212    .
                            1 1 . . 5419    .
                            1 1 . . 5329 5221
                            1 1 1 . 3142 5221
                            1 3 . . 6130    .
                            3 1 1 . 8142 5211
                            1 1 . . 2341    .
                            1 1 1 . 9510 5221
                            1 1 . . 5246    .
                            3 1 1 . 9212 9211
                            end
                            Last edited by Felix Kaysers; 20 Aug 2024, 03:11.
                            Cheers,
                            Felix
                            Stata Version: MP 18.0
                            OS: Windows 11

                            Comment


                            • #15
                              I see what you're saying now. Thanks for pointing that out. But, interestingly, your code makes the same error! When I run your code and mine on this new example data, both codes produce the same results, and both of them wrongly lose the value of jobmin1_02_nh, 5221, in the third observation. Here's a fix to my code that gets this data right:
                              Code:
                              gen `c(obs_t)' id = _n
                              
                              reshape long job@_01 job@_02_nh, i(id)
                              drop if missing(job_01) & missing(job_02_nh)
                              gsort id -_j
                              by id: replace _j = _n
                              reshape wide
                              order job*_01, after(id)




                              Comment

                              Working...
                              X