Data reshape from wide to long with a haphazard (sort of!!) varlist

Gopal Bhaskar

Join Date: May 2022

Posts: 5
#1

Data reshape from wide to long with a haphazard (sort of!!) varlist

19 Oct 2022, 12:48

I have a dataset that looks like this.

Code:

id a_1990 b_c_1990 d_e_1990 a_2019 b_c_2019 d_e_2019 100 4639 3034.902 1604.098 5641 3152.705 2488.295 200 3591 2532.941 1058.059 1549 961.7407 587.2593 301 3427 2575.986 851.0145 1620 1039.885 580.1152 302 5790 3420.109 2369.891 6344 3061.887 3282.113

And the list goes on. There are SO MANY variables. I want to convert this to a format in which the 'year' at the end of the variables becomes a new column and thus looks like a time series dataset.
One way could be listing all the variables and then simply writing-

Code:

reshape long varlist, i(id) j(year)

But, as I said, the varlist is long, and there is no definitive format that its naming follows. I wonder if there is any more economical and efficient way of doing this.

Any input is appreciated.

Last edited by Gopal Bhaskar; 19 Oct 2022, 12:50.
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1063
#2

19 Oct 2022, 14:18

I'm not sure if anything clever can be done through commands like ds? In any case, this method I'm proposing is rudimentary, but should get the task done without typing all the variables:

Code:

foreach x in 1990 2019{ preserve keep id *_`x' gen year = `x' rename *_`x' * save temp_`x', replace restore } use temp_1990 append using temp_2019

It would save a file for each year in the working directory, and then you can use append to connect them into a long file. append can be a loop as well if you have a lot of years. E.g.:

Code:

use temp_1990 forvales y = 1991(1)2019{ appending using temp_`y' }

Hope this helps.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

19 Oct 2022, 14:39

Here's my approach.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int(id a_1990) float(b_c_1990 d_e_1990) int a_2019 float(b_c_2019 d_e_2019)
100 4639 3034.902 1604.098 5641 3152.705 2488.295
200 3591 2532.941 1058.059 1549 961.7407 587.2593
301 3427 2575.986 851.0145 1620 1039.885 580.1152
302 5790 3420.109 2369.891 6344 3061.887 3282.113
end

ds id, not
local vars `r(varlist)'
macro list _vars

foreach v of local vars {
    local y = substr("`v'",-5,.)
    local years : list years | y
    local vl = length("`v'")-5
    local s = substr("`v'",1,`vl')
    local stubs : list stubs | s
}
macro list _years _stubs

reshape long `stubs', i(id) j(year) string
destring year, ignore(_) replace

list, sepby(id)

Code:

. ds id, not
a_1990    b_c_1990  d_e_1990  a_2019    b_c_2019  d_e_2019

. local vars `r(varlist)'

. macro list _vars
_vars:          a_1990 b_c_1990 d_e_1990 a_2019 b_c_2019 d_e_2019

. 
. foreach v of local vars {
  2.     local y = substr("`v'",-5,.)
  3.     local years : list years | y
  4.     local vl = length("`v'")-5
  5.     local s = substr("`v'",1,`vl')
  6.     local stubs : list stubs | s
  7. }

. macro list _years _stubs
_years:         _1990 _2019
_stubs:         a b_c d_e

. 
. reshape long `stubs', i(id) j(year) string
(j = _1990 _2019)

Data                               Wide   ->   Long
-----------------------------------------------------------------------------
Number of observations                4   ->   8           
Number of variables                   7   ->   5           
j variable (2 values)                     ->   year
xij variables:
                          a_1990 a_2019   ->   a
                      b_c_1990 b_c_2019   ->   b_c
                      d_e_1990 d_e_2019   ->   d_e
-----------------------------------------------------------------------------

. destring year, ignore(_) replace
year: character _ removed; replaced as int

. 
. list, sepby(id)

     +-----------------------------------------+
     |  id   year      a        b_c        d_e |
     |-----------------------------------------|
  1. | 100   1990   4639   3034.902   1604.098 |
  2. | 100   2019   5641   3152.705   2488.295 |
     |-----------------------------------------|
  3. | 200   1990   3591   2532.941   1058.059 |
  4. | 200   2019   1549   961.7407   587.2593 |
     |-----------------------------------------|
  5. | 301   1990   3427   2575.986   851.0145 |
  6. | 301   2019   1620   1039.885   580.1152 |
     |-----------------------------------------|
  7. | 302   1990   5790   3420.109   2369.891 |
  8. | 302   2019   6344   3061.887   3282.113 |
     +-----------------------------------------+

Comment

Gopal Bhaskar

Join Date: May 2022

Posts: 5
#4

20 Oct 2022, 02:10

Thanks a lot to both of you. Your solutions work well
I ended up using Lisowski's solution as it saves some space.
Comment

Announcement

Data reshape from wide to long with a haphazard (sort of!!) varlist

Comment

Comment

Comment