Reshape long panel data with variables in column

Firmin Clairant

Join Date: Feb 2017

Posts: 47
#1

Reshape long panel data with variables in column

15 Jan 2018, 22:41

Hello Members,

I would like to import a World Development Indicator database (format reshape wide) to "reshape long" but instead of the variables in a single column along with the command reshape it, i (Unit_ID series_id) d (year ) string, I would like to have each variable in the column "series" in column for each of the overpopulated countries for the different years (1980-2017). Enclosed is the format of the original base and one I would get as output.

Best regards,

Last edited by Firmin Clairant; 15 Jan 2018, 22:49.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30093
#2

15 Jan 2018, 22:57

If you read the FAQ, as all are asked to do before starting a thread, you will observe that screenshots of data sets are not helpful. Yours are readable, but often they are not. Moreover, in a situation such as this one, it would be helpful to test out some code on the data, but there is no way to import data to Stata from a screenshot. The helpful way to show example data is with the -dataex- command, which is explained in FAQ #12. Please do take the time to read the FAQ so that your future posts will be more helpful to those who want to help you, and so your chances of getting a correct response the first time around will be higher.

Based on what I can see in your screenshots, I will offer some code. But because I cannot test it, it may be wrong.

The trick is to do this in two steps. Your data is neither truly long nor wide: it is long in variables and wide in years. What you want is the opposite: long in years and wide in variables. So the first step is to make the data fully long, and then to make that result wide in years.

Code:

reshape long y, i(series country) j(year) rename y y_ reshape wide y_, i(country year) j(series) string rename y_* *

Now, apart from the fact that I may have made some errors in the above, there is a potential pitfall even if the code is basically correct. The code assumes that all of the values that occur in the variable series in the initial data will be legal variable names. If, however, some of those series' values have embedded blanks, or characters other than letters, digits and underscore (_), Stata will refuse to do the -reshape wide- because it can't make a legal variable with that name. Should you encounter that problem, you need to add one step: -replace series = strtoname(series)- before the -reshape wide-. This will replace characters that are not allowed in variable names with underscores. If you need to do this, however, you need to check over your data to make sure that originally distinct values of series remain distinct. So, for example, if there were two values of series that started out as B_534 and B#534, -strtoname()- will reduce them both to B_534, which will then cause Stata to again refuse the -reshape wide- as it cannot assign the same name to two different variables. Running -isid series- after the application of -strtoname()- will tell you whether you have this kind of problem or not.
1 like
Comment
Firmin Clairant

Join Date: Feb 2017

Posts: 47
#3

15 Jan 2018, 23:22

Thanks, Mr Schechter, for your guidance on the FAQ.

Indeed, your code works very well. It's really fantastic, for days I searched the solution.

Kind regards,
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#4

16 Jan 2018, 00:13

Hi, Firmin, Please check the command: (ssc install) wbopendata.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35694
#5

16 Jan 2018, 01:42

See also http://www.stata-journal.com/sjpdf.h...iclenum=dm0031
1 like
Comment

Announcement

Reshape long panel data with variables in column

Comment

Comment

Comment

Comment