Hi there I am using Stata13 on Windows 10 and trying to convert my data from long to wide format.
I am busy working with a panel dataset which has racial (black, indian, white, etc.) demographics for schools over a period 1996-2011 (year). I am needing to look at racial confirgurations of schools over the period to track possible racial integration. The main variables I am working with, and trying to reshape, are school_id (unique school identifier); dummy variables for each race group (b c i w) - their numeric value representing the total learners of that racial group for the specific year; and lastly year (categorical variable).
I have run the following command: reshape wide b c i w, i(school_id) j(year)... I get:
values of variable year not unique within school_id
Your data are currently long. You are performing a reshape wide. You specified i(school_id) and j(year). There are observations
within i(school_id) with the same value of j(year). In the long data, variables i() and j() together must uniquely identify the
observations.
long wide
+---------------+ +------------------+
| i j a b | | i a1 a2 b1 b2 |
|---------------| <--- reshape ---> |------------------|
| 1 1 1 2 | | 1 1 3 2 4 |
| 1 2 3 4 | | 2 5 7 6 8 |
| 2 1 5 6 | +------------------+
| 2 2 7 8 |
+---------------+
Type reshape error for a list of the problem variables.
However the duplicates of year within school_id are valid because for each year there are various grades within a specific school, and data on the quantity of each race group for that grade. I have run the following code to aggregate my data for a race group across grades for a specific year for each school_id: by school_id race year, sort: egen totalsum = total (quantity). Hence the data is structured as follows: e.g.
etc.
How could i structure my data (what code should i use) so that i can have the racial information for a given year for a school_id on a single line? As below:
Any help and suggestions would be much appreciated. Thank you in advance
I am busy working with a panel dataset which has racial (black, indian, white, etc.) demographics for schools over a period 1996-2011 (year). I am needing to look at racial confirgurations of schools over the period to track possible racial integration. The main variables I am working with, and trying to reshape, are school_id (unique school identifier); dummy variables for each race group (b c i w) - their numeric value representing the total learners of that racial group for the specific year; and lastly year (categorical variable).
I have run the following command: reshape wide b c i w, i(school_id) j(year)... I get:
values of variable year not unique within school_id
Your data are currently long. You are performing a reshape wide. You specified i(school_id) and j(year). There are observations
within i(school_id) with the same value of j(year). In the long data, variables i() and j() together must uniquely identify the
observations.
long wide
+---------------+ +------------------+
| i j a b | | i a1 a2 b1 b2 |
|---------------| <--- reshape ---> |------------------|
| 1 1 1 2 | | 1 1 3 2 4 |
| 1 2 3 4 | | 2 5 7 6 8 |
| 2 1 5 6 | +------------------+
| 2 2 7 8 |
+---------------+
Type reshape error for a list of the problem variables.
However the duplicates of year within school_id are valid because for each year there are various grades within a specific school, and data on the quantity of each race group for that grade. I have run the following code to aggregate my data for a race group across grades for a specific year for each school_id: by school_id race year, sort: egen totalsum = total (quantity). Hence the data is structured as follows: e.g.
school_id | b | c | i | w | year |
1 | 33 | . | . | . | 2000 |
1 | . | 56 | . | . | 2000 |
1 | . | . | 4 | . | 2000 |
1 | . | . | . | 70 | 2000 |
1 | . | 23 | . | . | 2001 |
1 | . | . | 6 | . | 2001 |
1 | . | . | . | 63 | 2001 |
2 | 81 | . | . | . | 1998 |
2 | . | 47 | . | . | 1998 |
2 | . | . | 12 | . | 1998 |
2 | . | . | . | 44 | 1998 |
How could i structure my data (what code should i use) so that i can have the racial information for a given year for a school_id on a single line? As below:
school_id | b | c | i | w | year |
1 | 33 | 56 | 4 | 70 | 2000 |
1 | . | 23 | 6 | 63 | 2001 |
2 | 81 | 47 | 12 | 44 | 1998 |
Comment