Reshape Long

Monica White

Join Date: Jan 2017
Posts: 98

27 May 2018, 21:21

Good Evening Statalist,

I'm trying to take the values of the following variables, and stack them into one column.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(v1 v2 v3 v4 v5 v6 v7 v8 v9 v10)
 98583.65  2689.404  39363.97  956.384 198349.94  19176.32  41658.11 14610875  2956.389  31561.92
102967.47  1425.272 66995.945  1043.32 173146.16 12588.352  20920.59 15258652   397.702 12350.628
 83608.82   787.546 23464.066  972.375  166769.4 16800.342  30197.59 16721873   421.001  41698.17
106049.66   682.952 15559.797  527.214 191980.97  12614.84  14221.04 18962966  1292.407 36579.727
113240.53  1637.298  25359.62  232.069  273081.7 12200.224  74230.52 22733624   591.017 18066.707
 104162.4   572.742  88162.01  667.438  307729.8  72783.21 179674.44 26220594   484.496  34124.22
155877.78   633.475 16615.023 1107.778    377619 27026.953 253609.83 28031248   830.957  27532.68
113431.47   5120.94  24307.29  837.954 376820.75 26385.254  188838.2 27193410    44.141   43695.1
163516.67 32832.684 140704.28  463.761 526008.06  27973.33 219713.38 32072884  2121.059  11183.41
 108074.3   474.194 13613.933  219.972  522529.9  26076.73 132737.53 21017804   403.216  6264.399
130121.13   301.185  23073.35  406.481  694166.9  26864.92  170456.1 25009190  1477.915  19216.52
199976.23  2018.589  60304.96  543.454  915072.3  51573.54 294171.88 32838644  6083.098  6703.542
 161832.1   2673.58  83778.19  568.747 1682149.4   51579.9 221039.06 32123374 19452.854 13451.702
127129.43  3128.498   60814.6  538.582   1251391   57733.7 277714.44 27630990 17739.893  6361.995
 103531.1  5534.069 37125.605  306.482   1943178  86158.93  323786.1 31607114 14304.836  11046.25
 81461.02  5111.324 109290.13 3621.451   1014381  81000.57 220334.34 28517866 16154.084 16795.736
 58080.41  7272.705  25207.12 4272.858  983769.9  78399.47  444441.1 27057612  12222.92  31385.52
end

I actually have over two-hundred similar variables. I would like to stack them into one column. I tried to use the

Code:

reshape long

command, and tried to follow some of the posts here, but I keep getting invalid syntax. I'm not sure what I'm doing wrong. I want them into one long column, because the data is ordered by county and years. (For example, the first column are values for Country A between 2000-2016). Thank you in advance.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

27 May 2018, 21:42

The obstacle to using -reshape long- here is that you have nothing to put into the -i()- option. You need a variable that uniquely identifies the observations. So you just create one, and then you can use -reshape long-.

Code:

gen original_obs_no = _n reshape long v, i(original_obs_no) j(_j)

At that point it will probably be important for you to in some way make the new variable _j, which ranges from 1 to 10 (or 1 to 200 in your real data) indicate which countries and years correspond to which values of _j, unless that information is actually not of importance for your later analyses.

Added: By the way, in Stata we do not speak of rows and columns we speak of observations and variables. If you think of a Stata data set as a spreadsheet and try to work with it as if it were one, you will get yourself into trouble, because the underlying data models and approaches are quite different. Your instincts and habits acquired from using spreadsheets are generally not helpful in Stata and sometimes even get in the way. In order to keep your mind focused on Stata when you are using Stata, it is better to avoid using spreadsheet terminology. The more you stick to Stata-appropriate terminology, the less likely you are to mess things up with spreadsheet-think.

Last edited by Clyde Schechter; 27 May 2018, 21:45.
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#3

27 May 2018, 22:03

Originally posted by Clyde Schechter View Post

The obstacle to using -reshape long- here is that you have nothing to put into the -i()- option. You need a variable that uniquely identifies the observations. So you just create one, and then you can use -reshape long-.

Code:

gen original_obs_no = _n reshape long v, i(original_obs_no) j(_j)

At that point it will probably be important for you to in some way make the new variable _j, which ranges from 1 to 10 (or 1 to 200 in your real data) indicate which countries and years correspond to which values of _j, unless that information is actually not of importance for your later analyses.

Added: By the way, in Stata we do not speak of rows and columns we speak of observations and variables. If you think of a Stata data set as a spreadsheet and try to work with it as if it were one, you will get yourself into trouble, because the underlying data models and approaches are quite different. Your instincts and habits acquired from using spreadsheets are generally not helpful in Stata and sometimes even get in the way. In order to keep your mind focused on Stata when you are using Stata, it is better to avoid using spreadsheet terminology. The more you stick to Stata-appropriate terminology, the less likely you are to mess things up with spreadsheet-think.

Hello Clyde,
Thank you so much. I agree with a lot of things you said.
Thank you for helping me with the code, but what the code did was stack the horizontal lines. I don't think I was clear in my question. What I need is all the columns to be stacked into one column. So, for example, the first column ends with 58080.412. I need the next column (starting with 2689.404) right under it and so forth.

Example:

98583.647
102967.47
83608.817
106049.656
[...]
58080.412
2689.404
1425.272
787.546
682.952
[...]
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4396
#4

27 May 2018, 23:04

Maybe stack is what you're looking for.

Code:

help stack

for more information.

.ÿclearÿ*

.ÿ
.ÿinputÿbyteÿaÿb

ÿÿÿÿÿÿÿÿÿÿÿÿaÿÿÿÿÿÿÿÿÿÿb
ÿÿ1.ÿ1ÿ3
ÿÿ2.ÿ2ÿ4
ÿÿ3.ÿend

.ÿ
.ÿstackÿaÿb,ÿinto(c)ÿclear

.ÿ
.ÿlist,ÿnoobs

ÿÿ+------------+
ÿÿ|ÿ_stackÿÿÿcÿ|
ÿÿ|------------|
ÿÿ|ÿÿÿÿÿÿ1ÿÿÿ1ÿ|
ÿÿ|ÿÿÿÿÿÿ1ÿÿÿ2ÿ|
ÿÿ|ÿÿÿÿÿÿ2ÿÿÿ3ÿ|
ÿÿ|ÿÿÿÿÿÿ2ÿÿÿ4ÿ|
ÿÿ+------------+

.ÿ
.ÿexit

endÿofÿdo-file

.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#5

27 May 2018, 23:15

Or, building on the code I provided in #2, just follow that with

Code:

sort _j original_obs_no

to get the order you are looking for.
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#6

28 May 2018, 05:17

Originally posted by Clyde Schechter View Post

Or, building on the code I provided in #2, just follow that with

Code:

sort _j original_obs_no

to get the order you are looking for.

Clyde,
Thank you!
I'm sorry for the silly questions, but I'm always trying to understand what the code does.

Does the "v" in

Code:

reshape long v, i(original_obs_no) j(_j)

mean all variables?

And why does

Code:

sort _j original_obs_no

get the job done by putting i and then j in that order?
Comment
Daniel Bela

Join Date: Apr 2014

Posts: 246
#7

28 May 2018, 05:48

Originally posted by Monica White View Post

Does the "v" in

Code:

reshape long v, i(original_obs_no) j(_j)

mean all variables?

It literally means "all variables with names starting with the letter 'v', immediately followed by a running number".
If your variable names would have read "var_1 var_2 var_3 ...", you would have had to execute

Code:

reshape long var_ , i(...) (j...)

There are more examples in -help reshape-.

Originally posted by Monica White View Post

And why does

Code:

sort _j original_obs_no

get the job done by putting i and then j in that order?

sort does not change the data's contents, just the ordering of observations in the dataset. See -help sort-.

Regards
Bela
1 like
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#8

29 May 2018, 18:22

Originally posted by Daniel Bela View Post

It literally means "all variables with names starting with the letter 'v', immediately followed by a running number".
If your variable names would have read "var_1 var_2 var_3 ...", you would have had to execute

Code:

reshape long var_ , i(...) (j...)

There are more examples in -help reshape-.

sort does not change the data's contents, just the ordering of observations in the dataset. See -help sort-.

Regards
Bela

Thank you, Bela!
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#9

01 Jun 2018, 17:38

Originally posted by Daniel Bela View Post

It literally means "all variables with names starting with the letter 'v', immediately followed by a running number".
If your variable names would have read "var_1 var_2 var_3 ...", you would have had to execute

Code:

reshape long var_ , i(...) (j...)

There are more examples in -help reshape-.

sort does not change the data's contents, just the ordering of observations in the dataset. See -help sort-.

Regards
Bela

This may be a bit of a dumb question, but what if the variables start with different letters. I have one where the variable names start from A through S.

I tried

Code:

reshape long A-S, i(original_obs_no) j(_j)

Obviously, not the right way to go.

Not sure if I missed something here: https://www.stata.com/manuals13/dreshape.pdf
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#10

01 Jun 2018, 17:55

There is no general solution to this situation. You have to customize the code to the particular variables you are working with.

The general principle is to mentally gather the variables that you need to -reshape long- and then parse them into stubs followed by suffixes. Then you have to list all of the stubs before the comma, and then specify a -j()- option that will receive the suffixes. The stubs are not, themselves, variable names, so you cannot list them using wildcards like *, ?, or the - notation for a range of variables. You have to spell them out.

Now, if there is a large number of them, there is usually a way to get Stata to build a list of those variables in a local macro, and then you can feed that local macro into -reshape-. So, if your variables are abc1 abc2 abc3 def1 def2 def3 ... xyz1 xyz2 xyz3, and if no other variables end in 1, you could do something like this:

Code:

// PUT ALL THE VARIABLES ENDING IN 1 IN A LOCAL MACRO ds *1 local stubs `r(varlist)' // NOW STRIP THE 1s OFF TO GET THE ACTUAL STUBS local stubs: subinstr local stubs "1" "", all // READY FOR RESHAPE reshape long `stubs', i(identifier_variable) j(_j)

But clearly this approach is idiosyncratic and requires specific code to deal with it. You have to deal with each such instance on its own terms, I'm afraid.
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#11

01 Jun 2018, 18:31

Originally posted by Clyde Schechter View Post

There is no general solution to this situation. You have to customize the code to the particular variables you are working with.

The general principle is to mentally gather the variables that you need to -reshape long- and then parse them into stubs followed by suffixes. Then you have to list all of the stubs before the comma, and then specify a -j()- option that will receive the suffixes. The stubs are not, themselves, variable names, so you cannot list them using wildcards like *, ?, or the - notation for a range of variables. You have to spell them out.

Now, if there is a large number of them, there is usually a way to get Stata to build a list of those variables in a local macro, and then you can feed that local macro into -reshape-. So, if your variables are abc1 abc2 abc3 def1 def2 def3 ... xyz1 xyz2 xyz3, and if no other variables end in 1, you could do something like this:

Code:

// PUT ALL THE VARIABLES ENDING IN 1 IN A LOCAL MACRO ds *1 local stubs `r(varlist)' // NOW STRIP THE 1s OFF TO GET THE ACTUAL STUBS local stubs: subinstr local stubs "1" "", all // READY FOR RESHAPE reshape long `stubs', i(identifier_variable) j(_j)

But clearly this approach is idiosyncratic and requires specific code to deal with it. You have to deal with each such instance on its own terms, I'm afraid.

Indeed. Thank you, Clyde. I think the quickest thing is just to rename the variables at the moment to v1, v2, etc.
Comment

Announcement

Reshape Long

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment