Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data reshape from wide to long with a haphazard (sort of!!) varlist

    I have a dataset that looks like this.
    Code:
    id    a_1990    b_c_1990    d_e_1990    a_2019    b_c_2019    d_e_2019
    100    4639    3034.902    1604.098    5641    3152.705    2488.295
    200    3591    2532.941    1058.059    1549    961.7407    587.2593
    301    3427    2575.986    851.0145    1620    1039.885    580.1152
    302    5790    3420.109    2369.891    6344    3061.887    3282.113
    And the list goes on. There are SO MANY variables. I want to convert this to a format in which the 'year' at the end of the variables becomes a new column and thus looks like a time series dataset.
    One way could be listing all the variables and then simply writing-

    Code:
    reshape long varlist, i(id) j(year)
    But, as I said, the varlist is long, and there is no definitive format that its naming follows. I wonder if there is any more economical and efficient way of doing this.

    Any input is appreciated.
    Last edited by Gopal Bhaskar; 19 Oct 2022, 12:50.

  • #2
    I'm not sure if anything clever can be done through commands like ds? In any case, this method I'm proposing is rudimentary, but should get the task done without typing all the variables:

    Code:
    foreach x in 1990 2019{
        preserve
        keep id *_`x'
        gen year = `x'
        rename *_`x' *
        save temp_`x', replace
        restore
    }
    
    use temp_1990
    append using temp_2019
    It would save a file for each year in the working directory, and then you can use append to connect them into a long file. append can be a loop as well if you have a lot of years. E.g.:

    Code:
    use temp_1990
    forvales y = 1991(1)2019{
        appending using temp_`y'
    }
    Hope this helps.

    Comment


    • #3
      Here's my approach.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int(id a_1990) float(b_c_1990 d_e_1990) int a_2019 float(b_c_2019 d_e_2019)
      100 4639 3034.902 1604.098 5641 3152.705 2488.295
      200 3591 2532.941 1058.059 1549 961.7407 587.2593
      301 3427 2575.986 851.0145 1620 1039.885 580.1152
      302 5790 3420.109 2369.891 6344 3061.887 3282.113
      end
      
      ds id, not
      local vars `r(varlist)'
      macro list _vars
      
      foreach v of local vars {
          local y = substr("`v'",-5,.)
          local years : list years | y
          local vl = length("`v'")-5
          local s = substr("`v'",1,`vl')
          local stubs : list stubs | s
      }
      macro list _years _stubs
      
      reshape long `stubs', i(id) j(year) string
      destring year, ignore(_) replace
      
      list, sepby(id)
      Code:
      . ds id, not
      a_1990    b_c_1990  d_e_1990  a_2019    b_c_2019  d_e_2019
      
      . local vars `r(varlist)'
      
      . macro list _vars
      _vars:          a_1990 b_c_1990 d_e_1990 a_2019 b_c_2019 d_e_2019
      
      . 
      . foreach v of local vars {
        2.     local y = substr("`v'",-5,.)
        3.     local years : list years | y
        4.     local vl = length("`v'")-5
        5.     local s = substr("`v'",1,`vl')
        6.     local stubs : list stubs | s
        7. }
      
      . macro list _years _stubs
      _years:         _1990 _2019
      _stubs:         a b_c d_e
      
      . 
      . reshape long `stubs', i(id) j(year) string
      (j = _1990 _2019)
      
      Data                               Wide   ->   Long
      -----------------------------------------------------------------------------
      Number of observations                4   ->   8           
      Number of variables                   7   ->   5           
      j variable (2 values)                     ->   year
      xij variables:
                                a_1990 a_2019   ->   a
                            b_c_1990 b_c_2019   ->   b_c
                            d_e_1990 d_e_2019   ->   d_e
      -----------------------------------------------------------------------------
      
      . destring year, ignore(_) replace
      year: character _ removed; replaced as int
      
      . 
      . list, sepby(id)
      
           +-----------------------------------------+
           |  id   year      a        b_c        d_e |
           |-----------------------------------------|
        1. | 100   1990   4639   3034.902   1604.098 |
        2. | 100   2019   5641   3152.705   2488.295 |
           |-----------------------------------------|
        3. | 200   1990   3591   2532.941   1058.059 |
        4. | 200   2019   1549   961.7407   587.2593 |
           |-----------------------------------------|
        5. | 301   1990   3427   2575.986   851.0145 |
        6. | 301   2019   1620   1039.885   580.1152 |
           |-----------------------------------------|
        7. | 302   1990   5790   3420.109   2369.891 |
        8. | 302   2019   6344   3061.887   3282.113 |
           +-----------------------------------------+

      Comment


      • #4
        Thanks a lot to both of you. Your solutions work well
        I ended up using Lisowski's solution as it saves some space.

        Comment

        Working...
        X