Reshape WHOLE dataset from long to wide

Julia Lopez

Join Date: Oct 2016

Posts: 11
#1

Reshape WHOLE dataset from long to wide

18 Jul 2019, 15:26

Hello,
I have a lot of variables to transpose from long to wide. Is there a way to transpose the whole dataset instead of picking specific variables? I have my ID and my repeated event time, but the variables I need reshaping are in the hundreds.

reshape wide X, id(id) j(event_time)

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#2

18 Jul 2019, 15:54

There is almost certainly some reasonably straightforward way to do it. But the details will depend on how your variables are named. Please use the -dataex- command and post an example of your data. You cannot give an example with hundreds of variables, but please show a representative example of perhaps half a dozen short series of variables indexed by the event time.

If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#3

18 Jul 2019, 16:17

I'm inferring from your example that the many variables you want to reshape long come in series of 3, each member of the series ending in var, early, or late.

You also need to have an id variable that identifies each observation uniquely. Your first post suggested that you have one, but your example does not. The code below creates one--skip that line if you already have it.

Code:

gen long id = _n ds *early local stubs `r(varlist)' local stubs: subinstr local stubs "early" "", all reshape long `stubs', i(id) j(event_time) string
1 like
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#4

19 Jul 2019, 05:03

Did you try to run the code in post #4?

Reshape can do exactly what you ask for, but the precise code depends on your data.
The suggestion by Clyde works if your variables come in groups of, for example:

Code:

first60_decelvar first60_decelearly first60_decellate first61_decelvar first61_decelearly first61_decellate

And then it does not matter how many groups there are.

If there is a different logic to your variable names, you need to tell us more.
Provide a dataex example with at least 2 'groups' of variables, or something that will make it clear if there are any patterns in your variable names.
Comment

Jorrit Gosens

Join Date: Jan 2015
Posts: 1019

19 Jul 2019, 08:26

Still some questions remain.
With your new data example, I'm guessing there is not much of a clear pattern in variable names.
In your fist post, you say you have id and a repeated event time.

So I am guessing you have two variables, id and event_time, plus some other variables.
In that case, you probably do not want:

Code:

reshape long variable_stubs, i(id) j(event_time) string

But rather:

Code:

reshape long variable_stubs, i(id event_time) j(some_varname)string

With below example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id event_time) byte(e2_cat e3_decel) float first60_totarea byte first60_recurrentprolong
1 2001  2  1      . 0
1 2002  2  1      . 0
1 2003  2  1      . 0
1 2004  2  1      . 0
1 2005  2  1      . 0
2 2001  2  1      . 0
2 2002  2  1      . 0
2 2003  2  1      . 0
2 2004  2  1      . 0
2 2005  2  1      . 0
3 2001  2  1      . 0
3 2002  2  1      . 0
3 2003  2  1      . 0
3 2004  2  1      . 0
3 2005  2  1      . 0
4 2001 99 99 6258.5 0
4 2002 99 99 6258.5 0
4 2003 99 99 6258.5 0
4 2004 99 99 6258.5 0
4 2005 99 99 6258.5 0
end

* Approximate code to reshape
ren * X*
ren Xid id
ren Xevent_time event_time
reshape long X, i(id event_time) j(var_name) string
ren X value

And again, the more details you give us about you data, the easier it is to answer you question. If I did not understood correctly, please explain better.

Comment

Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#6

19 Jul 2019, 09:04

Ah, you want to reshape wide, not long.
Then I have to say I doubt that is going to be a useful exercise. You say you currently have circa hundreds of variables. Lets say it's 100 exactly. If you then have 24 values of event_time, you will end up with 2400 variables.
That's probably not what you want?
Comment

Jorrit Gosens

Join Date: Jan 2015
Posts: 1019

19 Jul 2019, 10:52

Starting from the data example as in post #8:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id event_time) byte(e2_cat e3_decel) float first60_totarea byte first60_recurrentprolong
1 2001  2  1      . 0
1 2002  2  1      . 0
1 2003  2  1      . 0
1 2004  2  1      . 0
1 2005  2  1      . 0
2 2001  2  1      . 0
2 2002  2  1      . 0
2 2003  2  1      . 0
2 2004  2  1      . 0
2 2005  2  1      . 0
3 2001  2  1      . 0
3 2002  2  1      . 0
3 2003  2  1      . 0
3 2004  2  1      . 0
3 2005  2  1      . 0
4 2001 99 99 6258.5 0
4 2002 99 99 6258.5 0
4 2003 99 99 6258.5 0
4 2004 99 99 6258.5 0
4 2005 99 99 6258.5 0
end

tempfile merged

preserve
keep id event_time e2_cat
reshape wide e2_cat, i(id) j(event_time)
save `merged'
restore


foreach item of varlist e3_decel-first60_recurrentprolong{
preserve
keep id event_time `item'
reshape wide `item', i(id) j(event_time)
merge 1:1 id using `merged', nogen
save `merged', replace
restore
}
use `merged', clear

Run that example, with this example dataset, and see if this is really what you want.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30104
#8

19 Jul 2019, 12:02

You can't make it the prefix because all Stata variable names must begin with a letter or underscore (_) character; initial digits are not allowed.

If you would like to have the number appear before the name, but perhaps preceded and followed by an underscore the -reshape- command would look something like this:

Code:

reshape wide _@_`item', i(id) j(event_time)

As an aside, I will just remind you of Jorrit Gossens' good advice in #10 that doing this -reshape- to wide may not be a good idea. Most Stata commands work best, and some work only at all, with long data. You may well find this wide layout to be a hindrance in your subsequent work.
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#9

20 Jul 2019, 02:18

Please dont delete your messages from future questions.
This forum is not only a way to get help on your specific question, its also a reference for others with similar questions.
1 like
Comment

Announcement