Reshape long/wide problem

Lars Pete

Join Date: Nov 2020
Posts: 118

Reshape long/wide problem

12 Oct 2023, 10:48

Dear all,

This is the data.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str21 SeriesID double(Jan2001 Feb2001 Mar2001)
"JTS000000000000000HIL" 5722 5303 5528
"JTS000000000000000HIR"  4.3    4  4.2
"JTS000000000000000JOL" 5234 5097 4762
"JTS000000000000000JOR"  3.8  3.7  3.5
"JTS000000000000000LDL" 2220 1855 2133
"JTS000000000000000LDR"  1.7  1.4  1.6
"JTS000000000000000OSL"  377  323  379
"JTS000000000000000OSR"   .3   .2   .3
"JTS000000000000000QUL" 3245 3053 3054
"JTS000000000000000QUR"  2.4  2.3  2.3
"JTS000000000000000TSL" 5842 5231 5566
"JTS000000000000000TSR"  4.4  3.9  4.2
"JTS000000000000000UOR"  1.2  1.2  1.3
end

------------------ copy up to and including the previous line ------------------

I want to transpose it. So that I have "Year" and all 13 series ID as variables.

I did:

Code:

rename (Jan2001-Aug2023) value=
reshape long value, string i(SeriesID) j(year)

And I get:

Code:

----------------------- copy starting from the next line -----------------------


	Code:
	* Example generated by -dataex-. For more info, type help dataex
clear
input str21 SeriesID str7 year double value
"JTS000000000000000HIL" "Apr2001" 5204
"JTS000000000000000HIL" "Apr2002" 4928
"JTS000000000000000HIL" "Apr2003" 4600
"JTS000000000000000HIL" "Apr2004" 5152
"JTS000000000000000HIL" "Apr2005" 5362
"JTS000000000000000HIL" "Apr2006" 5215
"JTS000000000000000HIL" "Apr2007" 5353
"JTS000000000000000HIL" "Apr2008" 4948
"JTS000000000000000HIL" "Apr2009" 3888
"JTS000000000000000HIL" "Apr2010" 4159
"JTS000000000000000HIL" "Apr2011" 4296
end
------------------ copy up to and including the previous line ------------------

However, I want SeriesIDs as variables.
Like below:

Year JTS000000000000000HIL "JTS000000000000000HIR JTS000000000000000JO
. . . .
.
. . . .

I appreciate any help. Thanks.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#2

12 Oct 2023, 10:55

The basic approach is

Code:

rename (Jan2001-Mar2001) _= reshape long _, i(SeriesID) j(month_year) string reshape wide _, i(month_year) j(SeriesID) string rename _* *

You may need to tinker with your data in some ways to make this workable. In order for this code to run, the following must all be true of SeriesID:
It uniquely identifies observations in your data set

It never contains any characters that are forbidden in variable names (i.e. anything other than _, letters, and digits).

It never exceeds 31 characters in length.

If any of those three constraints are not met, you will have to modify the contents of SeriesID to make them true.
1 like
Comment
Lars Pete

Join Date: Nov 2020

Posts: 118
#3

12 Oct 2023, 11:01

Thanks a lot, Clyde. It worked like a charm!
Comment

Lars Pete

Join Date: Nov 2020
Posts: 118

13 Oct 2023, 18:40

Hi Clyde,

May I ask a follow up question? How to modify the contents of the SeriesID (or line_description is the variable this time) whose values look like this: "10-Personal income (thousands of dollars)$1000s". So, basically, none of the 3 conditions are met.

Code:

. reshape wide _, i(countyfips countyname region year) j(linecode_description_unit)
> string
string j variable linecode_description_unit not allowed to contain spaces;
    encode string variable linecode_description_unit into a numeric variable and
    use the new numeric variable as the j variable
r(111);

Code:

. reshape wide _, i(countyfips countyname region year) j(linecode_description_unit) 
(j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3
> 0 31)

Data                               Long   ->   Wide
-----------------------------------------------------------------------------
Number of observations        5,158,400   ->   166,400     
Number of variables                   6   ->   35          
j variable (31 values)linecode_description_unit->(dropped)
xij variables:
                                      _   ->   _1 _2 ... _31
-----------------------------------------------------------------------------

. 
end of do-file

. do "/var/folders/_v/v4352p5n62v_39tc3lqq86j40000gn/T//SD02413.000000"

. rename _* *
31 new variable names invalid
    You attempted to rename _1 to 1, _2 to 2, ...  Those are invalid Stata
    variable names.
r(198);

When I convert it to numeric, and do reshape, I get this error:

Thanks.

Last edited by Lars Pete; 13 Oct 2023, 18:49. Reason: Adding codes that give error

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#5

14 Oct 2023, 11:07

In both of these instances you are coming up against the rules that govern Stata variable names. Only letters, digits and the underscore (_) character are allowed, and, in addition, the first character may not be a digit. And the length may not exceed 32 characters.

In your second approach, following the advice to -encode-, things were fine until you tried to get rid of the initial underscore characters at the end: that would have led to variable names beginning with digits, which is not allowed. If you just omit that last -rename- you will have something that might be workable for you. The main problem is that the variable names will be _1, _2, ... etc. And you will have the cognitive burden of having to always remember what those mean in terms of the original values of SeriesID.

A potentially better approach is, instead of numericalizing SeriesID, is to edit it so that it will be compatible with becoming variable names. You face two challenges in this regard: excessive length and disallowed characters. There is a Stata function to handle each of these. You can combine them into a single command:

Code:

replace SeriesID = substr(strtoname(SeriesID), 1, 31)

The -strtoname()- function replaces all disallowed characters by underscores. And the -substr()- function truncates that result to its first 31 characters (which leaves room for the _ character that -reshape- will prefix.

Now, this approach may not be entirely satisfactory. Particularly damaging would be a situation where two different values of SeriesID are the same in their first 31 characters. That would result in the confounding of those two series. And -reshape- will pick this up as an error because you will no longer have unique identification. If you have a situation like this, you will have to use some -replace- statements to edit the offending values of SeriesID into something that retains their similarity to the original values enough that you can recognize it but also preserves their distinctiveness. You will have to use your own judgment how best to do that.
Comment
Lars Pete

Join Date: Nov 2020

Posts: 118
#6

18 Oct 2023, 23:41

Hi Clyde,
Thanks a lot. I just thought that the best approach would be to rename the vars after reshaping wide since they were too long. So, I just did that.
Comment

Announcement

Reshape long/wide problem

Comment

Comment

Comment

Comment

Comment