Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshaping dataset as a panel dataset

    Hello, I'm trying to reshape the dataset with following variable names:

    year gdp_province1_sector1 gdp_province1_sector2....
    1997 1000 2000
    1998 3000 4000



    into a regular panel setting:
    year id gdp province sector
    1997 1 1000 1 1
    1998 1 3000 1 1
    ....
    1997 2 2000 1 2
    1998 2 4000 1 2

    any thought how to do it efficiently? Thanks, Farrukh
    Last edited by Farrukh Suvankulov; 14 Oct 2022, 12:48.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(year gdp_province1_sector1 gdp_province1_sector2)
    1997 1000 2000
    1998 3000 4000
    end
    
    reshape long gdp, i(year) j(ps) string
    sort ps year
    generate id = sum(ps!=ps[_n-1])
    list, clean
    split(ps), parse("_")
    list, clean
    generate int province = real(subinstr(ps2,"province","",.))
    generate int sector = real(subinstr(ps3,"sector","",.))
    drop ps*
    order year id
    sort id year
    list, clean
    Code:
    . reshape long gdp, i(year) j(ps) string
    (j = _province1_sector1 _province1_sector2)
    
    Data                               Wide   ->   Long
    -----------------------------------------------------------------------------
    Number of observations                2   ->   4           
    Number of variables                   3   ->   3           
    j variable (2 values)                     ->   ps
    xij variables:
    gdp_province1_sector1 gdp_province1_sector2->  gdp
    -----------------------------------------------------------------------------
    
    . sort ps year
    
    . generate id = sum(ps!=ps[_n-1])
    
    . list, clean
    
           year                   ps    gdp   id  
      1.   1997   _province1_sector1   1000    1  
      2.   1998   _province1_sector1   3000    1  
      3.   1997   _province1_sector2   2000    2  
      4.   1998   _province1_sector2   4000    2  
    
    . split(ps), parse("_")
    variables created as string: 
    ps1  ps2  ps3
    
    . list, clean
    
           year                   ps    gdp   id   ps1         ps2       ps3  
      1.   1997   _province1_sector1   1000    1         province1   sector1  
      2.   1998   _province1_sector1   3000    1         province1   sector1  
      3.   1997   _province1_sector2   2000    2         province1   sector2  
      4.   1998   _province1_sector2   4000    2         province1   sector2  
    
    . generate int province = real(subinstr(ps2,"province","",.))
    
    . generate int sector = real(subinstr(ps3,"sector","",.))
    
    . drop ps*
    
    . order year id
    
    . sort id year
    
    . list, clean
    
           year   id    gdp   province   sector  
      1.   1997    1   1000          1        1  
      2.   1998    1   3000          1        1  
      3.   1997    2   2000          1        2  
      4.   1998    2   4000          1        2  
    
    .

    Comment


    • #3
      Thanks a lot, William! Much appreciated.

      Comment


      • #4
        In retrospect, you probably don't need the variable "id", but if you do, the approach in post #2 probably won't give you what you want (because _province10 will sort between province1 and province2). You might do better using
        Code:
        egen id = group(province sector)
        after those two variables have been created.

        Comment

        Working...
        X