Hi,
I am attempting to reshape a dataset from long to wide in Stata 15.1 on a Mac running High Sierra. I keep getting error code r(1004) with the message "characteristic contents too long. The maximum value of the contents is 67,784."
The data I am working with are from the IPUMS USA 5% samples from 1980-2010. I have created a category ID using the -egen group- command to categorize observations based on a number of characteristics. The data are currently in long form with one observation per year/state/category grouping where I have summed the number of individuals in the sample who fall into each category in every year and state.
Current data look like this, where catid_c_ is the raw count of individuals in that group and cat_w_ is the weighted count (and statefip is a numeric value for each state with state name as label):
catid values are labeled so as to keep track of what the values mean since there are so many of them. So the value label for catid == 1 would be something like "married hs nokids nhwht" and the value label for catid == 13 would be something like "unmarried ba_plus 2kids nhblk rural".
I want to reshape the data so that they are wide providing me with year/state observations with two columns for each category, one with the raw count of individuals in that category and one with a weighted count based on person weights which I have summed for each category.
So I am looking for a dataset that looks like:
I am using the following code to do the reshape:
reshape wide catid_c_ catid_w_, j(catid) i(year statefip)
When I tested the same code on a 5% sample of my dataset, it worked great. The sample dataset had 623 categories (maximum value of catid was 623). Now that I am working with my full dataset (148,265 observations), which has 2,517 values for catid, I am getting the "characteristic contents too long" error. I assume the error has something to do with there being so many more values for catid than in the sample dataset but I am at a loss as to how to fix the problem.
Thanks!
I am attempting to reshape a dataset from long to wide in Stata 15.1 on a Mac running High Sierra. I keep getting error code r(1004) with the message "characteristic contents too long. The maximum value of the contents is 67,784."
The data I am working with are from the IPUMS USA 5% samples from 1980-2010. I have created a category ID using the -egen group- command to categorize observations based on a number of characteristics. The data are currently in long form with one observation per year/state/category grouping where I have summed the number of individuals in the sample who fall into each category in every year and state.
Current data look like this, where catid_c_ is the raw count of individuals in that group and cat_w_ is the weighted count (and statefip is a numeric value for each state with state name as label):
Year | Statefips | catid | catid_c_ | catid_w_ |
1980 | Alabama | 1 | 20 | 400 |
1980 | Alabama | 5 | 5 | 100 |
1980 | Alabama | 9 | 7 | 140 |
1980 | Mississippi | 1 | 50 | 500 |
1980 | Mississippi | 5 | 13 | 130 |
1980 | Mississippi | 9 | 8 | 160 |
1980 | Washington | 9 | 10 | 200 |
1980 | Washington | 13 | 12 | 240 |
1980 | Washington | 25 | 20 | 200 |
catid values are labeled so as to keep track of what the values mean since there are so many of them. So the value label for catid == 1 would be something like "married hs nokids nhwht" and the value label for catid == 13 would be something like "unmarried ba_plus 2kids nhblk rural".
I want to reshape the data so that they are wide providing me with year/state observations with two columns for each category, one with the raw count of individuals in that category and one with a weighted count based on person weights which I have summed for each category.
So I am looking for a dataset that looks like:
Year | Statefips | catid_c_1 | catid_w_1 | catid_c_5 | catid_w_5 | catid_c_9 | catid_w_9 | catid_c_13 | catid_w_13 | catid_c_25 | catid_w_25 |
1980 | Alabama | 20 | 400 | 5 | 100 | 7 | 140 | 0 | 0 | 0 | 0 |
1980 | Mississippi | 50 | 500 | 13 | 130 | 8 | 160 | 0 | 0 | 0 | 0 |
1980 | Washington | 0 | 0 | 0 | 0 | 10 | 200 | 12 | 240 | 20 | 200 |
I am using the following code to do the reshape:
reshape wide catid_c_ catid_w_, j(catid) i(year statefip)
When I tested the same code on a 5% sample of my dataset, it worked great. The sample dataset had 623 categories (maximum value of catid was 623). Now that I am working with my full dataset (148,265 observations), which has 2,517 values for catid, I am getting the "characteristic contents too long" error. I assume the error has something to do with there being so many more values for catid than in the sample dataset but I am at a loss as to how to fix the problem.
Thanks!
Comment