Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a cumulative employment experience variable for panel data

    Hi!

    I am trying to create a variable for cumulative employment experience for my panel data out of employment spells data.

    I have two data sets:

    -One has start years and end years of employment spells for each person.

    e.g.
    Person No. Start Year End Year
    1 1971 1976
    1 1979 1980
    1 1985 1993
    1 1993 2005
    2 1974 1979
    2 1993 1997
    2 1999 2009
    3 1976 2009
    4 1973 1981
    4 1985 2007

    etc

    -One has all the person numbers for each year of the survey, which covers 1991 to 2009. I would like to make the EmploymentExperience variable.

    Person No. Year Variable1 Variable2 Variable3 EmploymentExperience
    1 1991
    1 1992
    1 1993
    1 1994
    ... ....
    1 2009
    2 1991
    2 1992
    2 1993

    etc


    With the first data set, using end year - start year, summing them all up and sorting by person number, I can see what each persons TOTAL EMPLOYMENT EXPERIENCE UP TO 2009 is.

    E.g.

    Person No. Start Year End Year TotEmp
    1 1971 1976 26
    1 1979 1980 26
    1 1985 1993 26
    1 1993 2005 26
    2 1974 1979 19
    2 1993 1997 19
    2 1999 2009 19
    3 1976 2009 33
    4 1973 1981 30
    4 1985 2007 30

    However, I need this to be cumulative for each survey year.

    E.g. for person 2, their total employment experience up to 2009 is 19 years, but I need to calculate it for each survey year. i.e. I need to come up with this

    Person Number Year EmploymentExperience
    1 1991 5

    1 1992 5

    1 1993 5

    1 1994 6

    1 1995 7

    1 1996 8

    1 1997 9

    1 1998 9

    1 1999 9

    etc up to


    1 2009 19


    And then I can conduct my panel regression, with wage as the y variable and employment experience as one of the x variables for each year 1991-2009.

    I could do this manually however I have thousands of observations so this wouldn't be possible.

    I need to find some way to generate this command. Any suggestions would be much appreciated.

    Thanks!

  • #2
    There might be easier ways, but this should work:

    Code:
    * Load spell data (you only need to load your actual data)
    clear
    input person start end
    1 1971 1976
    1 1979 1980
    1 1985 1993
    1 1993 2005
    2 1974 1979
    2 1993 1997
    2 1999 2009
    3 1976 2009
    4 1973 1981
    4 1985 2007
    end
    .
    
    * Generate time span
    gen span = end - start
    
    * Generate obervation for each year of work
    sum person
    local obs = r(N)
    foreach n of numlist 1/`obs' {
    sum span in `n'
    local multiply = r(mean) + 1
    expand `multiply' in `n'
    }
    sort person start
    
    * Generate year variable
    bysort person start: gen n = _n -1
    gen year = start + n
    drop n
    
    * Cumulative experience
    gen help = end[_n-1]!=start | person[_n-1]!=person
    keep if help==1
    bysort person: gen exp = sum(help)
    drop help
    
    * Save modifed data
    save spell.dta, replace
    
    * Load person data     (you only need to load your actual data)
    clear
    input person
    1
    2
    3
    4
    end
    .
    local maxobs = (2009 - 1960 + 1)
    expand `maxobs'
    bysort person: gen help = _n - 1
    gen year = 1960 + help
    drop help
    
    * Merge
    merge 1:1 person year using "spell.dta", keepusing(exp)
    erase "spell.dta"
    drop _merge
    
    * Add information for not included data
    sort person year
    replace exp = exp[_n-1] if person[_n-1]==person & exp[_n-1]!=. & exp==. // Carry forward
    replace exp = 0 if exp==.    // Assuming that you have a full employment history (i.e. no record means no employment experience)

    Comment


    • #3
      I've provided real Stata variable names, created these data sets and then posted them here using -dataex- (which is how you should post data examples in the future, please). I've also thrown in some random values for your Variable 1, Variable2, and Variable 3, just to illustrate.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(person year) float(var1 var2 var3)
      1 1991  3  2 40
      1 1992 11  4  1
      1 1993  5 13 13
      1 1994 48 42 49
      1 2009  7 12  0
      2 1991 30 19 36
      2 1992  8  5 41
      2 1993 48 42 34
      end
      
      tempfile dataset2
      save `dataset2'
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(person start_year end_year)
      1 1971 1976
      1 1979 1980
      1 1985 1993
      1 1993 2005
      2 1974 1979
      2 1993 1997
      2 1999 2009
      3 1976 2009
      4 1973 1981
      4 1985 2007
      end
      tempfile dataset1
      save `dataset1'
      
      //    CALCULATE RUNNING TOTAL EMPLOYMENT EXPERIENCE
      use `dataset1', clear
      expand end_year-start_year+1
      by person start_year, sort: gen year = start_year + _n - 1
      keep person year
      duplicates drop
      by person (year), sort: gen experience = _n
      //    NOW FILL IN THE GAP YEARS
      fillin person year
      by person (year), sort: replace experience = experience[_n-1] if missing(experience)
      drop _fillin
      
      merge 1:1 person year using `dataset2', keep(match using) nogenerate
      Added: Crossed with Sebastian Geiger's response.

      Comment


      • #4
        Originally posted by Sebastian Geiger View Post
        There might be easier ways, but this should work:

        Code:
        * Load spell data (you only need to load your actual data)
        clear
        input person start end
        1 1971 1976
        1 1979 1980
        1 1985 1993
        1 1993 2005
        2 1974 1979
        2 1993 1997
        2 1999 2009
        3 1976 2009
        4 1973 1981
        4 1985 2007
        end
        .
        
        * Generate time span
        gen span = end - start
        
        * Generate obervation for each year of work
        sum person
        local obs = r(N)
        foreach n of numlist 1/`obs' {
        sum span in `n'
        local multiply = r(mean) + 1
        expand `multiply' in `n'
        }
        sort person start
        
        * Generate year variable
        bysort person start: gen n = _n -1
        gen year = start + n
        drop n
        
        * Cumulative experience
        gen help = end[_n-1]!=start | person[_n-1]!=person
        keep if help==1
        bysort person: gen exp = sum(help)
        drop help
        
        * Save modifed data
        save spell.dta, replace
        
        * Load person data (you only need to load your actual data)
        clear
        input person
        1
        2
        3
        4
        end
        .
        local maxobs = (2009 - 1960 + 1)
        expand `maxobs'
        bysort person: gen help = _n - 1
        gen year = 1960 + help
        drop help
        
        * Merge
        merge 1:1 person year using "spell.dta", keepusing(exp)
        erase "spell.dta"
        drop _merge
        
        * Add information for not included data
        sort person year
        replace exp = exp[_n-1] if person[_n-1]==person & exp[_n-1]!=. & exp==. // Carry forward
        replace exp = 0 if exp==. // Assuming that you have a full employment history (i.e. no record means no employment experience)
        Thanks a lot, this is really useful.

        My only problem now is that the list of person ID's isn't actually in the order 111111, 2222, 33, 4444, 5555 it is more like 10004521, 10004521, 10004521, 10004521, 10006356, 10006356, 10006356, 10007356, 10007356, 10007356... etc.

        So I think that's why using N and n from your commands you have very kindly created won't work unfortunately.

        After the commands

        . foreach n of numlist 1/`obs' {
        2. sum span in `n'
        3. local multiply = r(mean) + 1
        4. expand `multiply' in `n'
        5. }

        It showed the error:
        invalid numlist has too many elements.

        Is this why or is it something I am doing wrong?

        I guess that is why Clyde Schechter said it was very important to include my actually data in the example, so I apologise to everyone for that.

        Is there any other possibility?
        Last edited by Ella Ki; 25 Mar 2017, 09:17.

        Comment


        • #5
          Have you tried Clyde's approach? As I was unaware of the command -fillin- and I don't know the full structure of your dataset, my approach is computation intensive.

          Nevertheless, replacing -foreach n of numlist 1/`obs'- with -forvalues n = 1/`obs'- should solve the problem you described above.

          Comment


          • #6
            Originally posted by Sebastian Geiger View Post
            Have you tried Clyde's approach? As I was unaware of the command -fillin- and I don't know the full structure of your dataset, my approach is computation intensive.

            Nevertheless, replacing -foreach n of numlist 1/`obs'- with -forvalues n = 1/`obs'- should solve the problem you described above.
            I tried it (Clyde's method) and ended up with strange results.

            Your method seemed to work, although when I looked closer it had come up with years of more than 4 digits and the same year for the same person had two different values. E.g. person 1 in 1996 34, and also another obvs. of person 1 in 1996 32....

            Not sure why as I'm an undergrad student and very unfamiliar with these commands. No staff at my university seem to know how to do this either, so recommended I used the forum.

            If it helps, this is an example of my data using data ex.

            Example generated by -dataex-. To install: ssc install dataex
            clear
            input long pid int(syear eyear)
            10007851 1955 1956
            10007851 1956 1958
            10007851 1958 1958
            10007851 1958 1966
            10007851 1990 1991
            10007851 1990 1992
            10007851 1990 1993
            10014574 1954 1956
            10014574 1956 1959
            10014603 1985 1991
            10014603 1985 1992
            10014603 1992 1992
            10014603 1984 1993
            10017934 1999 2000
            10017934 1993 2008
            10017964 1968 1969
            10017964 1970 1972
            10017964 1987 1991
            10017964 1990 1992
            10017964 1989 1993
            10017964 1977 1994
            10017993 2002 2002
            10019056 1948 1956
            10019056 1956 1958
            10019056 1958 1966
            end


            Also I need the "total cumulative experience" variable data from only years 1991 to 2009.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              I've provided real Stata variable names, created these data sets and then posted them here using -dataex- (which is how you should post data examples in the future, please). I've also thrown in some random values for your Variable 1, Variable2, and Variable 3, just to illustrate.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int(person year) float(var1 var2 var3)
              1 1991 3 2 40
              1 1992 11 4 1
              1 1993 5 13 13
              1 1994 48 42 49
              1 2009 7 12 0
              2 1991 30 19 36
              2 1992 8 5 41
              2 1993 48 42 34
              end
              
              tempfile dataset2
              save `dataset2'
              
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int(person start_year end_year)
              1 1971 1976
              1 1979 1980
              1 1985 1993
              1 1993 2005
              2 1974 1979
              2 1993 1997
              2 1999 2009
              3 1976 2009
              4 1973 1981
              4 1985 2007
              end
              tempfile dataset1
              save `dataset1'
              
              // CALCULATE RUNNING TOTAL EMPLOYMENT EXPERIENCE
              use `dataset1', clear
              expand end_year-start_year+1
              by person start_year, sort: gen year = start_year + _n - 1
              keep person year
              duplicates drop
              by person (year), sort: gen experience = _n
              // NOW FILL IN THE GAP YEARS
              fillin person year
              by person (year), sort: replace experience = experience[_n-1] if missing(experience)
              drop _fillin
              
              merge 1:1 person year using `dataset2', keep(match using) nogenerate
              Added: Crossed with Sebastian Geiger's response.
              I apologise for using the wrong format - I have figured out the data ex command now.

              I ran your comments and got this:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long pid float(year experience)
              10004521 -9 .
              10004521 -8 .
              10004521 -7 .
              10004521 -6 .
              10004521 -5 .
              10004521 -4 .
              10004521 -3 .
              10004521 -2 .
              10004521 -1 .
              10004521  0 .
              10004521  1 .
              10004521  2 .
              10004521  3 .
              10004521  4 .
              10004521  5 .
              10004521  6 .
              10004521  7 .
              10004521  8 .
              10004521  9 .
              10004521 10 .
              10004521 11 .
              10004521 12 .
              10004521 13 .
              10004521 14 .
              10004521 15 .
              end
              And for the much later observations:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input long pid float(year experience)
              18828132 19674 30
              18828132 19675 30
              18828132 19676 30
              18828132 19677 30
              18828132 19678 30
              18828132 19679 30
              18828132 19680 30
              18828132 19681 30
              18828132 19682 30
              18828132 19683 30
              18828132 19684 30
              18828132 19685 30
              18828132 19686 30
              18828132 19687 30
              18828132 19688 30
              18828132 19689 30
              18828132 19690 30
              18828132 19691 30
              18828132 19692 30
              18828132 19693 30
              18828132 19694 30
              end
              So I'm not too sure what's going on...

              Comment


              • #8
                I know this question was asked several years ago. But I have similar data with spell and used Clyde Schechter's method which expanded the data - and now that it is expanded I know how to do the rest. Thanks a million. I AM SO APPRECIATIVE OF ALL THOSE WHO TAKE THE TIME TO ANSWER QUESTIONS.

                P.S.:The other method did not work for me but probably user error.

                My do:

                expand YearEnd-YearStart+1
                by id YearStart, sort: gen year = YearStart + _n - 1

                Comment

                Working...
                X