Reshape wide efficiently in multi-level panel

Henry Strawforrd

Join Date: Sep 2021

Posts: 228
#1

Reshape wide efficiently in multi-level panel

14 Jun 2023, 05:18

It is common to have panels with two or more group dimensions, for example companies and workers.

How to efficiently reshape this to a wide format, eg a set of variables for all workers in a company? Ideally, I would like to add both identifier in the j() placeholder but this is not allowed.

Here is some code how I often do it and I presume it's super-inefficient. Egen = group() and merge are both commands that take forever on large datasets. How to accomplish such a task more efficiently?

Code:

clear input pid t eid 1 1 1 1 1 2 1 2 2 1 3 1 end gegen i = group(pid t) preserve keep i eid gduplicates drop bysort i : gen j = _n reshape wide eid, i(i) j(j) save temp.dta, replace restore keep pid t i gduplicates drop merge 1:1 i using temp.dta drop _merge i
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2406
#2

14 Jun 2023, 07:36

What exactly isn’t efficient for you? Is it that you have a large dataset and it takes more time than you would like?
Comment
Henry Strawforrd

Join Date: Sep 2021

Posts: 228
#3

14 Jun 2023, 07:50

Exactly
Comment

Hemanshu Kumar

Join Date: Mar 2015
Posts: 1478

14 Jun 2023, 09:36

Your code produces:

Code:

. li, noobs sep(0)
  +-----------------------+
  | pid   t   eid1   eid2 |
  |-----------------------|
  |   1   1      1      2 |
  |   1   2      2      . |
  |   1   3      1      . |
  +-----------------------+

which I can also get by simply doing:

Code:

bysort pid t (eid): gen j = _n
reshape wide eid, i(pid t) j(j)

. li, noobs sep(0)

  +-----------------------+
  | pid   t   eid1   eid2 |
  |-----------------------|
  |   1   1      1      2 |
  |   1   2      2      . |
  |   1   3      1      . |
  +-----------------------+

Last edited by Hemanshu Kumar; 14 Jun 2023, 09:43.

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2406
#5

14 Jun 2023, 09:59

In addition to the more efficient solution from Hemanshu, I see that you have made liberal use of -gtools- (SSC) for many standard commands in Stata. Why not also use -greshape- from the same set of tools? That will also speed up your code.
Comment
Henry Strawforrd

Join Date: Sep 2021

Posts: 228
#6

20 Jun 2023, 01:38

Thanks a lot for the helpful responses!
Comment

Announcement

Reshape wide efficiently in multi-level panel

Comment

Comment

Comment

Comment

Comment