Data from long to wide using multiple variables

May Blake

Join Date: Feb 2020
Posts: 137

Data from long to wide using multiple variables

04 Oct 2023, 13:07

Hi, I have a dataset with multiple variables entered in the long format. I would like to reshape to wide format. I am using the code below but getting an error that "values of variable grade not unique within id, error code r9". Can you help me solve the issue? My data example is attached with the code.

----------------------- copy starting from the next line -----------------------

Code:

reshape wide Class Type classification gradeType, i(id)  j(grade)

------------------ copy up to and including the previous line ------------------

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int id str3 Type str6 Class str2(grade gradeType) str1 classification
1001 "In"  "Day"    "1"  "XM" "0"
1001 "In"  "Day"    "1"  "XM" "0"
1001 "In"  "Midday" "1"  "XM" "0"
1001 "In"  "Midday" "1"  "XM" "1"
1001 "In"  "Midday" "1"  "XM" "0"
1001 "In"  "Midday" "1"  "XM" "0"
1001 "In"  "Night"  "1"  "XM" "0"
1001 "In"  "Night"  "1"  "XM" "0"
1001 "In"  "Night"  "1"  "XM" "0"
1001 "In"  "Night"  "1"  "XM" "1"
1001 "In"  "Night"  "1"  "XM" "0"
1001 "In"  "Mixed"  "1"  "XM" "0"
1001 "In"  "Mixed"  "1"  "XM" "1"
1001 "In"  "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "1"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "1"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "1"  "XM" "0"
1001 "Out" "Mixed"  "2"  "YR" "1"
1001 "Out" "Mixed"  "3"  "YR" "1"
1001 "Out" "Mixed"  "4"  "YR" "1"
1001 "Out" "Mixed"  "5"  "YR" "1"
1001 "Out" "Mixed"  "6"  "YR" "1"
1001 "Out" "Mixed"  "7"  "YR" "1"
1001 "Out" "Mixed"  "8"  "YR" "1"
1001 "Out" "Mixed"  "9"  "YR" "1"
1001 "Out" "Mixed"  "10" "YR" "1"
 202 "In"  "Day"    "1"  "XM" "0"
 202 "In"  "Day"    "1"  "XM" "1"
 202 "In"  "Midday" "1"  "XM" "0"
 202 "In"  "Midday" "1"  "XM" "0"
 202 "In"  "Midday" "1"  "XM" "0"
 202 "In"  "Midday" "1"  "XM" "0"
 202 "In"  "Night"  "1"  "XM" "0"
 202 "In"  "Night"  "1"  "XM" "0"
 202 "In"  "Night"  "1"  "XM" "1"
 202 "In"  "Night"  "1"  "XM" "0"
 202 "In"  "Night"  "1"  "XM" "0"
 202 "In"  "Mixed"  "1"  "XM" "0"
 202 "In"  "Mixed"  "1"  "XM" "0"
 202 "In"  "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "1"  "XM" "1"
 202 "Out" "Mixed"  "1"  "XM" "1"
 202 "Out" "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "1"  "XM" "1"
 202 "Out" "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "1"  "XM" "0"
 202 "Out" "Mixed"  "2"  "YR" "1"
 202 "Out" "Mixed"  "3"  "YR" "1"
 202 "Out" "Mixed"  "4"  "YR" "1"
 202 "Out" "Mixed"  "5"  "YR" "1"
 202 "Out" "Mixed"  "6"  "YR" "1"
 202 "Out" "Mixed"  "7"  "YR" "1"
 202 "Out" "Mixed"  "8"  "YR" "1"
 202 "Out" "Mixed"  "9"  "YR" "1"
 202 "Out" "Mixed"  "10" "YR" "1"
 307 "In"  "Day"    "1"  "XM" "0"
 307 "In"  "Day"    "1"  "XM" "0"
 307 "In"  "Midday" "1"  "XM" "1"
 307 "In"  "Midday" "1"  "XM" "0"
 307 "In"  "Midday" "1"  "XM" "0"
 307 "In"  "Midday" "1"  "XM" "1"
 307 "In"  "Night"  "1"  "XM" "0"
 307 "In"  "Night"  "1"  "XM" "0"
 307 "In"  "Night"  "1"  "XM" "0"
 307 "In"  "Night"  "1"  "XM" "0"
 307 "In"  "Night"  "1"  "XM" "0"
 307 "In"  "Mixed"  "1"  "XM" "0"
 307 "In"  "Mixed"  "1"  "XM" "0"
 307 "In"  "Mixed"  "1"  "XM" "1"
 307 "In"  "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "1"
 307 "Out" "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "0"
 307 "Out" "Mixed"  "1"  "XM" "1"
 307 "Out" "Mixed"  "2"  "YR" "1"
 307 "Out" "Mixed"  "3"  "YR" "1"
 307 "Out" "Mixed"  "4"  "YR" "1"
 307 "Out" "Mixed"  "5"  "YR" "1"
 307 "Out" "Mixed"  "6"  "YR" "1"
 307 "Out" "Mixed"  "7"  "YR" "1"
 307 "Out" "Mixed"  "8"  "YR" "1"
end

------------------ copy up to and including the previous line ------------------

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#2

04 Oct 2023, 14:41

If the example data is truly representative of your full data set, then it cannot be -reshape-d to wide. Normally when you get this error message, the problem is that id and grade together fail to identify unique observations, and the solution is that the combination of id, some other variable(s) and grade do identify unique observations. Then adding the "other variable(s)" to the -i()- option of -reshape- allows things to proceed. But in your data example there are no such other variable(s).

Part of this problem arises because you have a large number of observations that are exact duplicates of each other. That will not work with -reshape-. You will need to figure out why you have these surplus copies of observations and either correct them (so they are no longer exact duplicates) or eliminate them (if they are correct, but are just surplus). But even after you do that, you will still not be able to -reshape- this example data into wide because you still need extra variables to distinguish among the many, many observations having the same id and grade values. The variables in -i()- and -j()- must, taken together, identify unique observations in your data set in order to use -reshape-. But there are no candidates that will work: Type, Class and gradeType are the only other variables, but as they are the variables you are looking to widen, they cannot be used in -i()- or -j()-.

Finally, let me point out that wide data layouts are not very useful in Stata. Most Stata commands work best, or only, with long data. There are exceptions to that, and if you know for sure you will be using them, then -reshape-ing to wide may indeed be necessary. But I do wonder what your purpose is in trying to -reshape- this data to wide, especially since, if it is the correct data, it is totally unsuitable for that. So perhaps what is really needed is to explain why you were hoping to do this in the first place--probably there is some better way to proceed.
1 like
Comment
May Blake

Join Date: Feb 2020

Posts: 137
#3

05 Oct 2023, 10:00

Thank you Clyde for the in-depth explanation and clarification. My reasoning for wanting to reshape this data is that some of the grade variable in my actual dataset represents subject demographics in long format. I am not sure how to use tabulate for chi-square or Fisher's when the data is in that format. I could drop the duplicate observations for demographics and the layout is still strange because the questions are not their own variables to tabulate. On another note, I am not sure why I have the exact same duplicates in the data, I will need to extract the data from the source once again to figure out this problem.
Comment

Announcement

Data from long to wide using multiple variables

Comment

Comment