Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape wide efficiently in multi-level panel

    It is common to have panels with two or more group dimensions, for example companies and workers.

    How to efficiently reshape this to a wide format, eg a set of variables for all workers in a company? Ideally, I would like to add both identifier in the j() placeholder but this is not allowed.

    Here is some code how I often do it and I presume it's super-inefficient. Egen = group() and merge are both commands that take forever on large datasets. How to accomplish such a task more efficiently?


    Code:
    clear
    
    input pid t eid
    1 1 1
    1 1 2
    1 2 2
    1 3 1
    end
    
    gegen i = group(pid t)
    
    preserve
        keep i eid
        gduplicates drop
        bysort i : gen j = _n
        reshape wide eid, i(i) j(j)
        save temp.dta, replace
    restore
    
    keep pid t i
    gduplicates drop
    merge 1:1 i using temp.dta
    drop _merge i

  • #2
    What exactly isn’t efficient for you? Is it that you have a large dataset and it takes more time than you would like?

    Comment


    • #3
      Exactly

      Comment


      • #4
        Your code produces:

        Code:
        . li, noobs sep(0)
          +-----------------------+
          | pid   t   eid1   eid2 |
          |-----------------------|
          |   1   1      1      2 |
          |   1   2      2      . |
          |   1   3      1      . |
          +-----------------------+
        which I can also get by simply doing:

        Code:
        bysort pid t (eid): gen j = _n
        reshape wide eid, i(pid t) j(j)
        
        . li, noobs sep(0)
        
          +-----------------------+
          | pid   t   eid1   eid2 |
          |-----------------------|
          |   1   1      1      2 |
          |   1   2      2      . |
          |   1   3      1      . |
          +-----------------------+
        Last edited by Hemanshu Kumar; 14 Jun 2023, 09:43.

        Comment


        • #5
          In addition to the more efficient solution from Hemanshu, I see that you have made liberal use of -gtools- (SSC) for many standard commands in Stata. Why not also use -greshape- from the same set of tools? That will also speed up your code.

          Comment


          • #6
            Thanks a lot for the helpful responses!

            Comment

            Working...
            X