Finding the number of different pairs from different variables that satisfies some criteria - panel data

Otavio Conceicao

Join Date: Feb 2017
Posts: 65

Finding the number of different pairs from different variables that satisfies some criteria - panel data

29 Oct 2020, 09:21

Dear all,

I am working with a matched employer-employee dataset from Brazil in which each observation represents a pair worker-firm for some period and I would like to know for how many different firms from the same sector a given worker has worked.

In particular, the dataset is like

year	week	id_worker	id_firm	firm_sector
2017	1	17	25	Services
2017	2	17	41	Manufacturing
2017	3	17	19	Manufacturing
2017	3	17	25	Services
2017	4	17	53	Services
2017	5	17	19	Manufacturing

I would like to have other three new variables such as

year	week	id_worker	id_firm	firm_sector	number_of_firms_services	number_of_firms_manufacturing	number_of_firms_commerce
2017	1	17	25	Services	2	1	1
2017	2	17	41	Manufacturing	2	1	1
2017	3	17	19	Commerce	2	1	1
2017	3	17	25	Services	2	1	1
2017	4	17	53	Services	2	1	1
2017	5	17	19	Commerce	2	1	1

where 'number_of_firms_sector' counts the number of different firms for which the worker has worked in a given 'sector', where sector \in {Manufacturing, Services, Commerce}.

Can you help to find a solution for that?

Thank you!

Code for importing:

clear
input year week id_worker id_firm str15 firm_sector
2017 1 17 25 "Services"
2017 2 17 41 "Manufacturing"
2017 3 17 19 "Commerce"
2017 3 17 25 "Services"
2017 4 17 53 "Services"
2017 5 17 19 "Commerce"
end

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

29 Oct 2020, 11:07

Code:

clear input year week id_worker id_firm str15 firm_sector 2017 1 17 25 "Services" 2017 2 17 41 "Manufacturing" 2017 3 17 19 "Commerce" 2017 3 17 25 "Services" 2017 4 17 53 "Services" 2017 5 17 19 "Commerce" end by id_worker firm_sector id_firm, sort: gen number_of_firms_ = (_n == 1) by id_worker firm_sector: replace number_of_firms_ = sum(number_of_firms_) by id_worker firm_sector: replace number_of_firms_ = number_of_firms_[_N] gen long obs_no = _n reshape wide number_of_firms_, i(obs_no) j(firm_sector) string drop obs_no

Notes: You might want to pick a shorter stub than "number of firms." At 15 characters, plus an additional _ to separate it from the sector, if you have any long sector names, you will exceed Stata's 32 character limit on variable names, and the code will break. Also, this code will only work if all the sector names are suitable for inclusion in a variable name: that is, no embedded blanks, no special characters, only letters, digits, and underscore. If you have sector names that violate those constraints, you will want to perhaps apply the -strtoname()- function to them first.

The final layout created by this code is pretty awkward. On the one hand, the number of firms per sector is set out side-by-side in separate variables. Yet you still have multiple observations per worker showing the individual details of each job. That is likely to prove confusing or create problems as you move ahead in this project. Probably the most sensible thing to do is to stop before the -gen long obs_no = _n- command and keep the data in fully long layout. That works best for nearly everything in Stata.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35616
#3

29 Oct 2020, 11:08

This is I think a variant on your previous thread.

Code:

egen tag = tag(id_worker id_firm firm_sector) egen wanted = total(tag), by(id_worker firm_sector)
Comment
Otavio Conceicao

Join Date: Feb 2017

Posts: 65
#4

31 Oct 2020, 14:04

Thank you very much Clyde Schechter and Nick Cox !!!
Comment

Announcement

Finding the number of different pairs from different variables that satisfies some criteria - panel data

Comment

Comment

Comment