Confused about reshape wide

Kevin Gawora

Join Date: Jul 2021

Posts: 5
#1

Confused about reshape wide

31 Jan 2023, 16:01

Hello,
I have the following dataset listed in the attachment.

I am trying to reshape the data so that the years are the variables and the country names are the observations, therefore, I need to use reshape wide. However, I am confused about the command, as I am new to Stata.

Here is what I have

reshape wide USA GreatBritain France Germany FormerUSSR Japan Italy Canada Australia, i(USA GreatBritain France Germany FormerUSSR Japan Italy Canada Australia) j(Year)

I think my problem is that I am unsure what to put in the i() part of the command, as I am not really sure what it would be in this situation.

How can I reshape the data correctly?

Thanks

Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30126
#2

31 Jan 2023, 16:19

This will do something similar to what you ask. As you posted data as a screenshot, which is not importable into Stata, I have made up a toy data set similar to it to illustrate the code.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(Year Argentina Brazil Germany) 1800 1594 600 958 1801 . . . 1820 1710 600 . end ds Year, not rename (`r(varlist)') v= reshape long v, i(Year) j(Country) string reshape wide v, i(Country) j(Year) // I RECOMMEND YOU SKIP THIS STEP

This departs from your request in that the variable names cannot be the years in Stata because legal variable names cannot begin with a digit. So the variable names here begin with the letter v, followed by the year.

That said, you probably should not do this anyway. All you are doing is exchanging one wide data layout for another. But wide data layouts are not very useful in Stata: most of Stata's data management and analysis commands work best, or only, with long data layouts. So what I urge you to do, instead, is omit the final -reshape wide- command. Leave the data in long layout, where both Country and Year are variables, and the variable v contains the corresponding value. This is the most usable way to organize this data for Stata. Only if you specifically know you will be working with those few Stata commands that require a wide data layout should you go all the way to the layout you asked for.

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Kevin Gawora

Join Date: Jul 2021

Posts: 5
#3

31 Jan 2023, 16:33

Originally posted by Clyde Schechter View Post

This will do something similar to what you ask. As you posted data as a screenshot, which is not importable into Stata, I have made up a toy data set similar to it to illustrate the code.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(Year Argentina Brazil Germany) 1800 1594 600 958 1801 . . . 1820 1710 600 . end ds Year, not rename (`r(varlist)') v= reshape long v, i(Year) j(Country) string reshape wide v, i(Country) j(Year) // I RECOMMEND YOU SKIP THIS STEP

This departs from your request in that the variable names cannot be the years in Stata because legal variable names cannot begin with a digit. So the variable names here begin with the letter v, followed by the year.

That said, you probably should not do this anyway. All you are doing is exchanging one wide data layout for another. But wide data layouts are not very useful in Stata: most of Stata's data management and analysis commands work best, or only, with long data layouts. So what I urge you to do, instead, is omit the final -reshape wide- command. Leave the data in long layout, where both Country and Year are variables, and the variable v contains the corresponding value. This is the most usable way to organize this data for Stata. Only if you specifically know you will be working with those few Stata commands that require a wide data layout should you go all the way to the layout you asked for.

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Thanks for your prompt and detailed response. I will be sure to use he dataex command going forward. What you're saying definitely makes sense, in terms of my data especially. The whole reason that I was trying to reshape into wide anyway was that I wanted to generate new observations for each country that was its 1939 GDP minus its 1929 GDP, and thought that it would be easier to do if the years were variables.

I've tried something like this

bysort USA GreatBritain France Germany FormerUSSR Japan Italy Canada Australia:: gen GD_GDP_Change = Year[1939] - Year[1929]

as well as

bysort USA GreatBritain France Germany FormerUSSR Japan Italy Canada Australia:: gen GD_GDP_Change2 = Year[_n] - Year[_n-10]

but I'm only getting missing values. I am fairly sure that the by command would be correct here since I need to perform the command on all the country variables listed.

Thanks again for your help.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30126
#4

31 Jan 2023, 16:52

OK. For the purpose of calculating the difference between the 1939 and 1929 values, that is probably most easily done in the wide layout you asked for. So go ahead and do that. Then it's just gen gdp_change = v1939-v1929. But I would recommend that you then follow that with -reshape long- (you don't have to specify any variables, or -i()- or -j()- this time) to get back to the long layout I recommended.

Or, it can be done, a bit awkwardly, in the long layout I recommended:

Code:

by Country (Year), sort: egen gdp_1939 = max(cond(Year == 1939, v, .)) by Country (Year): egen gdp_1929 = max(cond(Year == 1929, v, .)) gen gdp_change = gdp_1939 - gdp_1929

and then you can drop gdp_1939 and gdp_1929 themselves if you have no further use for them.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35735
#5

01 Feb 2023, 02:53

If you were absolutely certain that there were always data for 1929 and 1939 you could ask for the total

Code:

total(((year==1939) * v) - ((year == 1929) * v))

but this would produce the wrong answer is either or both values were missing. So, the code of Clyde Schechter is the safe and not sorry choice.
Comment

Announcement

Confused about reshape wide

Comment

Comment

Comment

Comment