Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error on reshape "characteristic contents too long"

    Hi,

    I am attempting to reshape a dataset from long to wide in Stata 15.1 on a Mac running High Sierra. I keep getting error code r(1004) with the message "characteristic contents too long. The maximum value of the contents is 67,784."

    The data I am working with are from the IPUMS USA 5% samples from 1980-2010. I have created a category ID using the -egen group- command to categorize observations based on a number of characteristics. The data are currently in long form with one observation per year/state/category grouping where I have summed the number of individuals in the sample who fall into each category in every year and state.

    Current data look like this, where catid_c_ is the raw count of individuals in that group and cat_w_ is the weighted count (and statefip is a numeric value for each state with state name as label):
    Year Statefips catid catid_c_ catid_w_
    1980 Alabama 1 20 400
    1980 Alabama 5 5 100
    1980 Alabama 9 7 140
    1980 Mississippi 1 50 500
    1980 Mississippi 5 13 130
    1980 Mississippi 9 8 160
    1980 Washington 9 10 200
    1980 Washington 13 12 240
    1980 Washington 25 20 200

    catid values are labeled so as to keep track of what the values mean since there are so many of them. So the value label for catid == 1 would be something like "married hs nokids nhwht" and the value label for catid == 13 would be something like "unmarried ba_plus 2kids nhblk rural".

    I want to reshape the data so that they are wide providing me with year/state observations with two columns for each category, one with the raw count of individuals in that category and one with a weighted count based on person weights which I have summed for each category.

    So I am looking for a dataset that looks like:
    Year Statefips catid_c_1 catid_w_1 catid_c_5 catid_w_5 catid_c_9 catid_w_9 catid_c_13 catid_w_13 catid_c_25 catid_w_25
    1980 Alabama 20 400 5 100 7 140 0 0 0 0
    1980 Mississippi 50 500 13 130 8 160 0 0 0 0
    1980 Washington 0 0 0 0 10 200 12 240 20 200

    I am using the following code to do the reshape:

    reshape wide catid_c_ catid_w_, j(catid) i(year statefip)

    When I tested the same code on a 5% sample of my dataset, it worked great. The sample dataset had 623 categories (maximum value of catid was 623). Now that I am working with my full dataset (148,265 observations), which has 2,517 values for catid, I am getting the "characteristic contents too long" error. I assume the error has something to do with there being so many more values for catid than in the sample dataset but I am at a loss as to how to fix the problem.

    Thanks!

  • #2
    Yes, the problem is that you have more values of catid in the full data than -reshape- can handle.

    I think what I would do is split the data set into subsets defined by ranges of catid. Since you know the code runs without difficulty when there are only 623 categories, you might create one data set with observations where catid ranges 1-625, another with catid ranging 626-1250, another with 1251-1875, and finally one with 1876-2517. You can -reshape wide- each of those separately. And then you can -merge- them all together.

    But, before you go ahead and do this, give some serious thought about why you are doing this in the first place. What on earth do you plan to do with a data set that has over 2,500 variables? What analyses are you going to do: the vast majority of Stata data management and analysis commands will work best (or only) with the data in the long layout you had in the first place. Unless you have a very specific plan in mind that requires the data to be in wide layout, you are probably just going to cripple yourself by doing this. Your next post is likely to be asking for help with some analysis, and the first part of the solution will be to -reshape- back to long! Think about it carefully. A lot of people are used to working with wide layouts because they are easier for human eyes to comprehend. But they are difficult for Stata to work with (and impossible for many tasks). So don't go there unless you have a really good reason to.

    Comment


    • #3
      Hi, Thanks for your response. This is very helpful. There is a plan that requires the data to be in wide layout, but I hear and appreciate your concerns.

      Comment

      Working...
      X