Finding the number of firms for which a given worker has worked - Matched employer-employee dataset

Otavio Conceicao

Join Date: Feb 2017

Posts: 65
#1

Finding the number of firms for which a given worker has worked - Matched employer-employee dataset

27 Oct 2020, 18:11

Dear all,

I am working with a matched employer-employee dataset from Brazil in which each observation represents a pair worker-firm for some period and I would like to know for how many different firms a given worker has worked.

In particular, the dataset is like

year week id_worker id_firm

2017 1 17 25

2017 2 17 41

2017 3 17 19

2017 3 17 25

2017 4 17 53

2017 5 17 19

I would like to create a variable like 'number_of_firms_week'

year week id_worker id_firm number_of_firms_week

2017 1 17 25 4

2017 2 17 41 4

2017 3 17 19 4

2017 3 17 25 4

2017 4 17 53 4

2017 5 17 19 4

where 'number_of_firms_week' is the variable for the number of different firms for which a given worker has worked (considering all periods).

Can you help me to find a solution for that?

Thank you very much!

Below I provide the code for importing the example dataset into Stata :

clear
input year week id_worker id_firm
2017 1 17 25
2017 2 17 41
2017 3 17 19
2017 3 17 25
2017 4 17 53
2017 5 17 19
end

Obs: I tried to use 'dataex' but I found it easier, in this case, to provide the 'importing code'.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#2

27 Oct 2020, 19:33

Code:

by id_worker id_firm, sort: gen number_of_firms_week = _n == 1 by id_worker: replace number_of_firms_week = sum(number_of_firms_week) by id_worker: replace number_of_firms_week = number_of_firms_week[_N]

The name of this new variable, number_of_firms_week, bothers me. It makes me think that it should be the number of distinct firms the worker was it in a single week--which is not the case. Did you perhaps not explain the problem fully.

It may have been simpler for you to create this code for importing the example data, but once you start using string variables, or date variables, or value labeled integer variables, it will be much easier to use -dataex-. In fact, nothing could be simpler. Load the data into memory and just type

Code:

dataex

in the command line. -dataex- will then make an example data set out of the first 100 observations in your data (which is usually good enough for the purpose). Then copy the -dataex- output from the Results window into the forum here and you are done.

If you want to show specific observations, rather than the first 100, you can use -if- and -in- with the -dataex- command just as you would with almost all Stata commands.

Look, what you did here is fine--it looks almost exactly like -dataex- output and it works for those who are trying to recreate your data example in Stata. But eventually you will have some data for which this is just tiresome to do, and -dataex- is just so easy to use.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35782
#3

27 Oct 2020, 19:38

@Clyde gives excellent advice. It seems to me that the variable wanted is the number of distinct values of firm for each worker. which is

Code:

egen tag = tag(id_worker id_firm) egen wanted = total(tag), by(id_worker)

as discussed in https://www.stata-journal.com/articl...article=dm0042

@Clyde's code is exactly equivalent and asks Stata to do less....
Comment
Otavio Conceicao

Join Date: Feb 2017

Posts: 65
#4

29 Oct 2020, 08:35

Thank you very much @Clyde
Comment
Otavio Conceicao

Join Date: Feb 2017

Posts: 65
#5

29 Oct 2020, 08:52

Thank you very much Clyde Schechter and Nick Cox !! (Please, dismiss the previous post!)

Yes, Clyde, I agree with you as to the name of the variable!

I normally do not use 'dataex' because both the name of the string variables and their values are in Portuguese so it would be more difficult to convey the idea using the original format.

Last edited by Otavio Conceicao; 29 Oct 2020, 09:21.
Comment

year	week	id_worker	id_firm
2017	1	17	25
2017	2	17	41
2017	3	17	19
2017	3	17	25
2017	4	17	53
2017	5	17	19

year	week	id_worker	id_firm	number_of_firms_week
2017	1	17	25	4
2017	2	17	41	4
2017	3	17	19	4
2017	3	17	25	4
2017	4	17	53	4
2017	5	17	19	4

Announcement

Finding the number of firms for which a given worker has worked - Matched employer-employee dataset

Comment

Comment

Comment

Comment