Creating a cumulative employment experience variable for panel data

Ella Ki

Join Date: Mar 2017

Posts: 39
#1

Creating a cumulative employment experience variable for panel data

24 Mar 2017, 14:36

Hi!

I am trying to create a variable for cumulative employment experience for my panel data out of employment spells data.

I have two data sets:

-One has start years and end years of employment spells for each person.

e.g.
Person No. Start Year End Year
1 1971 1976
1 1979 1980
1 1985 1993
1 1993 2005
2 1974 1979
2 1993 1997
2 1999 2009
3 1976 2009
4 1973 1981
4 1985 2007

etc

-One has all the person numbers for each year of the survey, which covers 1991 to 2009. I would like to make the EmploymentExperience variable.

Person No. Year Variable1 Variable2 Variable3 EmploymentExperience
1 1991
1 1992
1 1993
1 1994
... ....
1 2009
2 1991
2 1992
2 1993

etc

With the first data set, using end year - start year, summing them all up and sorting by person number, I can see what each persons TOTAL EMPLOYMENT EXPERIENCE UP TO 2009 is.

E.g.

Person No. Start Year End Year TotEmp
1 1971 1976 26
1 1979 1980 26
1 1985 1993 26
1 1993 2005 26
2 1974 1979 19
2 1993 1997 19
2 1999 2009 19
3 1976 2009 33
4 1973 1981 30
4 1985 2007 30

However, I need this to be cumulative for each survey year.

E.g. for person 2, their total employment experience up to 2009 is 19 years, but I need to calculate it for each survey year. i.e. I need to come up with this

Person Number Year EmploymentExperience
1 1991 5

1 1992 5

1 1993 5

1 1994 6

1 1995 7

1 1996 8

1 1997 9

1 1998 9

1 1999 9

etc up to

1 2009 19

And then I can conduct my panel regression, with wage as the y variable and employment experience as one of the x variables for each year 1991-2009.

I could do this manually however I have thousands of observations so this wouldn't be possible.

I need to find some way to generate this command. Any suggestions would be much appreciated.

Thanks!
Tags: None

Sebastian Geiger

Join Date: Oct 2015
Posts: 124

24 Mar 2017, 15:41

There might be easier ways, but this should work:

Code:

* Load spell data (you only need to load your actual data)
clear
input person start end
1 1971 1976
1 1979 1980
1 1985 1993
1 1993 2005
2 1974 1979
2 1993 1997
2 1999 2009
3 1976 2009
4 1973 1981
4 1985 2007
end
.

* Generate time span
gen span = end - start

* Generate obervation for each year of work
sum person
local obs = r(N)
foreach n of numlist 1/`obs' {
sum span in `n'
local multiply = r(mean) + 1
expand `multiply' in `n'
}
sort person start

* Generate year variable
bysort person start: gen n = _n -1
gen year = start + n
drop n

* Cumulative experience
gen help = end[_n-1]!=start | person[_n-1]!=person
keep if help==1
bysort person: gen exp = sum(help)
drop help

* Save modifed data
save spell.dta, replace

* Load person data     (you only need to load your actual data)
clear
input person
1
2
3
4
end
.
local maxobs = (2009 - 1960 + 1)
expand `maxobs'
bysort person: gen help = _n - 1
gen year = 1960 + help
drop help

* Merge
merge 1:1 person year using "spell.dta", keepusing(exp)
erase "spell.dta"
drop _merge

* Add information for not included data
sort person year
replace exp = exp[_n-1] if person[_n-1]==person & exp[_n-1]!=. & exp==. // Carry forward
replace exp = 0 if exp==.    // Assuming that you have a full employment history (i.e. no record means no employment experience)

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30357

24 Mar 2017, 15:48

I've provided real Stata variable names, created these data sets and then posted them here using -dataex- (which is how you should post data examples in the future, please). I've also thrown in some random values for your Variable 1, Variable2, and Variable 3, just to illustrate.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(person year) float(var1 var2 var3)
1 1991  3  2 40
1 1992 11  4  1
1 1993  5 13 13
1 1994 48 42 49
1 2009  7 12  0
2 1991 30 19 36
2 1992  8  5 41
2 1993 48 42 34
end

tempfile dataset2
save `dataset2'

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(person start_year end_year)
1 1971 1976
1 1979 1980
1 1985 1993
1 1993 2005
2 1974 1979
2 1993 1997
2 1999 2009
3 1976 2009
4 1973 1981
4 1985 2007
end
tempfile dataset1
save `dataset1'

//    CALCULATE RUNNING TOTAL EMPLOYMENT EXPERIENCE
use `dataset1', clear
expand end_year-start_year+1
by person start_year, sort: gen year = start_year + _n - 1
keep person year
duplicates drop
by person (year), sort: gen experience = _n
//    NOW FILL IN THE GAP YEARS
fillin person year
by person (year), sort: replace experience = experience[_n-1] if missing(experience)
drop _fillin

merge 1:1 person year using `dataset2', keep(match using) nogenerate

Added: Crossed with Sebastian Geiger's response.

Comment

Ella Ki

Join Date: Mar 2017
Posts: 39

25 Mar 2017, 09:13

Originally posted by Sebastian Geiger View Post

There might be easier ways, but this should work:

Code:

* Load spell data (you only need to load your actual data)
clear
input person start end
1 1971 1976
1 1979 1980
1 1985 1993
1 1993 2005
2 1974 1979
2 1993 1997
2 1999 2009
3 1976 2009
4 1973 1981
4 1985 2007
end
.

* Generate time span
gen span = end - start

* Generate obervation for each year of work
sum person
local obs = r(N)
foreach n of numlist 1/`obs' {
sum span in `n'
local multiply = r(mean) + 1
expand `multiply' in `n'
}
sort person start

* Generate year variable
bysort person start: gen n = _n -1
gen year = start + n
drop n

* Cumulative experience
gen help = end[_n-1]!=start | person[_n-1]!=person
keep if help==1
bysort person: gen exp = sum(help)
drop help

* Save modifed data
save spell.dta, replace

* Load person data (you only need to load your actual data)
clear
input person
1
2
3
4
end
.
local maxobs = (2009 - 1960 + 1)
expand `maxobs'
bysort person: gen help = _n - 1
gen year = 1960 + help
drop help

* Merge
merge 1:1 person year using "spell.dta", keepusing(exp)
erase "spell.dta"
drop _merge

* Add information for not included data
sort person year
replace exp = exp[_n-1] if person[_n-1]==person & exp[_n-1]!=. & exp==. // Carry forward
replace exp = 0 if exp==. // Assuming that you have a full employment history (i.e. no record means no employment experience)

Thanks a lot, this is really useful.

My only problem now is that the list of person ID's isn't actually in the order 111111, 2222, 33, 4444, 5555 it is more like 10004521, 10004521, 10004521, 10004521, 10006356, 10006356, 10006356, 10007356, 10007356, 10007356... etc.

So I think that's why using N and n from your commands you have very kindly created won't work unfortunately.

After the commands

. foreach n of numlist 1/`obs' {
2. sum span in `n'
3. local multiply = r(mean) + 1
4. expand `multiply' in `n'
5. }

It showed the error:
invalid numlist has too many elements.

Is this why or is it something I am doing wrong?

I guess that is why Clyde Schechter said it was very important to include my actually data in the example, so I apologise to everyone for that.

Is there any other possibility?

Last edited by Ella Ki; 25 Mar 2017, 09:17.

Comment

Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#5

25 Mar 2017, 10:32

Have you tried Clyde's approach? As I was unaware of the command -fillin- and I don't know the full structure of your dataset, my approach is computation intensive.

Nevertheless, replacing -foreach n of numlist 1/`obs'- with -forvalues n = 1/`obs'- should solve the problem you described above.
Comment
Ella Ki

Join Date: Mar 2017

Posts: 39
#6

25 Mar 2017, 13:06

Originally posted by Sebastian Geiger View Post

Have you tried Clyde's approach? As I was unaware of the command -fillin- and I don't know the full structure of your dataset, my approach is computation intensive.

Nevertheless, replacing -foreach n of numlist 1/`obs'- with -forvalues n = 1/`obs'- should solve the problem you described above.

I tried it (Clyde's method) and ended up with strange results.

Your method seemed to work, although when I looked closer it had come up with years of more than 4 digits and the same year for the same person had two different values. E.g. person 1 in 1996 34, and also another obvs. of person 1 in 1996 32....

Not sure why as I'm an undergrad student and very unfamiliar with these commands. No staff at my university seem to know how to do this either, so recommended I used the forum.

If it helps, this is an example of my data using data ex.

Example generated by -dataex-. To install: ssc install dataex
clear
input long pid int(syear eyear)
10007851 1955 1956
10007851 1956 1958
10007851 1958 1958
10007851 1958 1966
10007851 1990 1991
10007851 1990 1992
10007851 1990 1993
10014574 1954 1956
10014574 1956 1959
10014603 1985 1991
10014603 1985 1992
10014603 1992 1992
10014603 1984 1993
10017934 1999 2000
10017934 1993 2008
10017964 1968 1969
10017964 1970 1972
10017964 1987 1991
10017964 1990 1992
10017964 1989 1993
10017964 1977 1994
10017993 2002 2002
10019056 1948 1956
10019056 1956 1958
10019056 1958 1966
end

Also I need the "total cumulative experience" variable data from only years 1991 to 2009.
Comment

Ella Ki

Join Date: Mar 2017
Posts: 39

25 Mar 2017, 13:23

Originally posted by Clyde Schechter View Post

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(person year) float(var1 var2 var3)
1 1991 3 2 40
1 1992 11 4 1
1 1993 5 13 13
1 1994 48 42 49
1 2009 7 12 0
2 1991 30 19 36
2 1992 8 5 41
2 1993 48 42 34
end

tempfile dataset2
save `dataset2'

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(person start_year end_year)
1 1971 1976
1 1979 1980
1 1985 1993
1 1993 2005
2 1974 1979
2 1993 1997
2 1999 2009
3 1976 2009
4 1973 1981
4 1985 2007
end
tempfile dataset1
save `dataset1'

// CALCULATE RUNNING TOTAL EMPLOYMENT EXPERIENCE
use `dataset1', clear
expand end_year-start_year+1
by person start_year, sort: gen year = start_year + _n - 1
keep person year
duplicates drop
by person (year), sort: gen experience = _n
// NOW FILL IN THE GAP YEARS
fillin person year
by person (year), sort: replace experience = experience[_n-1] if missing(experience)
drop _fillin

merge 1:1 person year using `dataset2', keep(match using) nogenerate

Added: Crossed with Sebastian Geiger's response.

I apologise for using the wrong format - I have figured out the data ex command now.

I ran your comments and got this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pid float(year experience)
10004521 -9 .
10004521 -8 .
10004521 -7 .
10004521 -6 .
10004521 -5 .
10004521 -4 .
10004521 -3 .
10004521 -2 .
10004521 -1 .
10004521  0 .
10004521  1 .
10004521  2 .
10004521  3 .
10004521  4 .
10004521  5 .
10004521  6 .
10004521  7 .
10004521  8 .
10004521  9 .
10004521 10 .
10004521 11 .
10004521 12 .
10004521 13 .
10004521 14 .
10004521 15 .
end

And for the much later observations:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pid float(year experience)
18828132 19674 30
18828132 19675 30
18828132 19676 30
18828132 19677 30
18828132 19678 30
18828132 19679 30
18828132 19680 30
18828132 19681 30
18828132 19682 30
18828132 19683 30
18828132 19684 30
18828132 19685 30
18828132 19686 30
18828132 19687 30
18828132 19688 30
18828132 19689 30
18828132 19690 30
18828132 19691 30
18828132 19692 30
18828132 19693 30
18828132 19694 30
end

So I'm not too sure what's going on...

Comment

Alex Cortez

Join Date: Sep 2018

Posts: 8
#8

15 Oct 2018, 18:10

I know this question was asked several years ago. But I have similar data with spell and used Clyde Schechter's method which expanded the data - and now that it is expanded I know how to do the rest. Thanks a million. I AM SO APPRECIATIVE OF ALL THOSE WHO TAKE THE TIME TO ANSWER QUESTIONS.

P.S.:The other method did not work for me but probably user error.

My do:

expand YearEnd-YearStart+1
by id YearStart, sort: gen year = YearStart + _n - 1
Comment

Announcement