It is common to have panels with two or more group dimensions, for example companies and workers.
How to efficiently reshape this to a wide format, eg a set of variables for all workers in a company? Ideally, I would like to add both identifier in the j() placeholder but this is not allowed.
Here is some code how I often do it and I presume it's super-inefficient. Egen = group() and merge are both commands that take forever on large datasets. How to accomplish such a task more efficiently?
How to efficiently reshape this to a wide format, eg a set of variables for all workers in a company? Ideally, I would like to add both identifier in the j() placeholder but this is not allowed.
Here is some code how I often do it and I presume it's super-inefficient. Egen = group() and merge are both commands that take forever on large datasets. How to accomplish such a task more efficiently?
Code:
clear
input pid t eid
1 1 1
1 1 2
1 2 2
1 3 1
end
gegen i = group(pid t)
preserve
keep i eid
gduplicates drop
bysort i : gen j = _n
reshape wide eid, i(i) j(j)
save temp.dta, replace
restore
keep pid t i
gduplicates drop
merge 1:1 i using temp.dta
drop _merge i

Comment