Hello,
I would like to reshape data that I would characterize as "triply" long. The primary variable, 'measure', is a count of how many people in each organization fall into some specific category. I would like to separate the categories nested within the variable data_category into variables in and of themselves and list them in wide form. Employee type breaks each respective data category down into two additional subsets. What I want to do is this: I would like to combine the three variables: 'data_category', 'data_subcategory', and 'employee_type' to create a single variable. For example, I want to create a variable that combines 'certificate level' with the counts for the subcategories '4 yr bachelors', '5 yr masters', etc. and then also includes the employee type category. I have aggregate data; the data follows this general pattern seen below, but it is important to note that there are a longer list of sub-categories.
So, in its most basic form, I would expect the data would go from what is above to:
Where 'cl_4ba_ad' represents certificate level 4 year bachelors for administrators, etc.
In all, these subcategories account for 81 rows of data for each organization. So rather than having 81 rows, in its most basic (though not most compact) form, I would like to have 1 row and 81 columns for each organization. Also, because there are so many rows, I don't see it feasible for me to write syntax listing all the variable names in the reshape command. Is there a general way for me to write the reshape syntax?
Thank you. I am not clear on where to start with such a dense array of data.
I would like to reshape data that I would characterize as "triply" long. The primary variable, 'measure', is a count of how many people in each organization fall into some specific category. I would like to separate the categories nested within the variable data_category into variables in and of themselves and list them in wide form. Employee type breaks each respective data category down into two additional subsets. What I want to do is this: I would like to combine the three variables: 'data_category', 'data_subcategory', and 'employee_type' to create a single variable. For example, I want to create a variable that combines 'certificate level' with the counts for the subcategories '4 yr bachelors', '5 yr masters', etc. and then also includes the employee type category. I have aggregate data; the data follows this general pattern seen below, but it is important to note that there are a longer list of sub-categories.
school_year | dstrct | instn | instn_name | data_category | data_sub_category | employee_type | measure | |
1 | 2010-11 | 601 | 0103 | A High School | Certificate Level | 4 Yr Bachelor's | Administrators | 1 |
2 | 2010-11 | 601 | 0103 | A High School | Certificate Level | 5 Yr Master's | Administrators | 1 |
3 | 2010-11 | 601 | 0103 | A High School | Certificate Level | 6 Yr Specialist's | Administrators | 3 |
4 | 2010-11 | 601 | 0103 | A High School | Certificate Level | 7 Yr Doctoral | Administrators | 1 |
5 | 2010-11 | 601 | 0103 | A High School | Certificate Level | Other * | Administrators | 0 |
6 | 2010-11 | 601 | 0103 | A High School | Certificate Level | 4 Yr Bachelor's | PK-12 Teachers | 21 |
7 | 2010-11 | 601 | 0103 | A High School | Certificate Level | 5 Yr Master's | PK-12 Teachers | 22 |
8 | 2010-11 | 601 | 0103 | A High School | Certificate Level | 6 Yr Specialist's | PK-12 Teachers | 15 |
9 | 2010-11 | 601 | 0103 | A High School | Certificate Level | 7 Yr Doctoral | PK-12 Teachers | 2 |
10 | 2010-11 | 601 | 0103 | A High School | Certificate Level | Other * | PK-12 Teachers | 1 |
11 | 2010-11 | 601 | 0103 | A High School | Certified Personnel | Professional | Administrators | 6 |
12 | 2010-11 | 601 | 0103 | A High School | Certified Personnel | Provisional | Administrators | 0 |
13 | 2010-11 | 601 | 0103 | A High School | Certified Personnel | Professional | PK-12 Teachers | 61 |
14 | 2010-11 | 601 | 0103 | A High School | Certified Personnel | Provisional | PK-12 Teachers | 0 |
So, in its most basic form, I would expect the data would go from what is above to:
school_year | dstrct | instn | instn_name | cl_4ba_ad | cl_5ma_ad | cl_6sp_ad | cl_7doc_ad | cl_o_ad | |
1 | 2010-11 | 601 | 103 | A High School | 1 | 1 | 3 | 1 | 0 |
2 | 2010-11 | 601 | 104 | B Middle School | … | … | … | … | … |
3 | 2010-11 | 601 | 105 | C Lower School | … | … | … | … | … |
4 | 2010-11 | 602 | 106 | X High School | … | … | … | … | … |
5 | 2010-11 | 602 | 107 | Y Middle School | … | … | … | … | … |
6 | 2010-11 | 602 | 108 | Z Lower School | … | … | … | … | … |
Where 'cl_4ba_ad' represents certificate level 4 year bachelors for administrators, etc.
In all, these subcategories account for 81 rows of data for each organization. So rather than having 81 rows, in its most basic (though not most compact) form, I would like to have 1 row and 81 columns for each organization. Also, because there are so many rows, I don't see it feasible for me to write syntax listing all the variable names in the reshape command. Is there a general way for me to write the reshape syntax?
Thank you. I am not clear on where to start with such a dense array of data.
Comment