Error on reshape "characteristic contents too long"

Jessie Kalbfeld

Join Date: Feb 2018
Posts: 2

Error on reshape "characteristic contents too long"

21 Feb 2018, 18:09

Hi,

I am attempting to reshape a dataset from long to wide in Stata 15.1 on a Mac running High Sierra. I keep getting error code r(1004) with the message "characteristic contents too long. The maximum value of the contents is 67,784."

The data I am working with are from the IPUMS USA 5% samples from 1980-2010. I have created a category ID using the -egen group- command to categorize observations based on a number of characteristics. The data are currently in long form with one observation per year/state/category grouping where I have summed the number of individuals in the sample who fall into each category in every year and state.

Current data look like this, where catid_c_ is the raw count of individuals in that group and cat_w_ is the weighted count (and statefip is a numeric value for each state with state name as label):

Year	Statefips	catid	catid_c_	catid_w_
1980	Alabama	1	20	400
1980	Alabama	5	5	100
1980	Alabama	9	7	140
1980	Mississippi	1	50	500
1980	Mississippi	5	13	130
1980	Mississippi	9	8	160
1980	Washington	9	10	200
1980	Washington	13	12	240
1980	Washington	25	20	200

catid values are labeled so as to keep track of what the values mean since there are so many of them. So the value label for catid == 1 would be something like "married hs nokids nhwht" and the value label for catid == 13 would be something like "unmarried ba_plus 2kids nhblk rural".

I want to reshape the data so that they are wide providing me with year/state observations with two columns for each category, one with the raw count of individuals in that category and one with a weighted count based on person weights which I have summed for each category.

So I am looking for a dataset that looks like:

Year	Statefips	catid_c_1	catid_w_1	catid_c_5	catid_w_5	catid_c_9	catid_w_9	catid_c_13	catid_w_13	catid_c_25	catid_w_25
1980	Alabama	20	400	5	100	7	140	0	0	0	0
1980	Mississippi	50	500	13	130	8	160	0	0	0	0
1980	Washington	0	0	0	0	10	200	12	240	20	200

I am using the following code to do the reshape:

reshape wide catid_c_ catid_w_, j(catid) i(year statefip)

When I tested the same code on a 5% sample of my dataset, it worked great. The sample dataset had 623 categories (maximum value of catid was 623). Now that I am working with my full dataset (148,265 observations), which has 2,517 values for catid, I am getting the "characteristic contents too long" error. I assume the error has something to do with there being so many more values for catid than in the sample dataset but I am at a loss as to how to fix the problem.

Thanks!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

21 Feb 2018, 19:25

Yes, the problem is that you have more values of catid in the full data than -reshape- can handle.

I think what I would do is split the data set into subsets defined by ranges of catid. Since you know the code runs without difficulty when there are only 623 categories, you might create one data set with observations where catid ranges 1-625, another with catid ranging 626-1250, another with 1251-1875, and finally one with 1876-2517. You can -reshape wide- each of those separately. And then you can -merge- them all together.

But, before you go ahead and do this, give some serious thought about why you are doing this in the first place. What on earth do you plan to do with a data set that has over 2,500 variables? What analyses are you going to do: the vast majority of Stata data management and analysis commands will work best (or only) with the data in the long layout you had in the first place. Unless you have a very specific plan in mind that requires the data to be in wide layout, you are probably just going to cripple yourself by doing this. Your next post is likely to be asking for help with some analysis, and the first part of the solution will be to -reshape- back to long! Think about it carefully. A lot of people are used to working with wide layouts because they are easier for human eyes to comprehend. But they are difficult for Stata to work with (and impossible for many tasks). So don't go there unless you have a really good reason to.
2 likes
Comment
Jessie Kalbfeld

Join Date: Feb 2018

Posts: 2
#3

22 Feb 2018, 05:56

Hi, Thanks for your response. This is very helpful. There is a plan that requires the data to be in wide layout, but I hear and appreciate your concerns.
Comment

Announcement

Error on reshape "characteristic contents too long"

Comment

Comment