Hi,
I'm using STATA 14.2 (apologies in advance as I am using stata on a university server and cannot use dataex on the university version)
Below is an example of the data I have:
I want to collapse the data, so each row provides demographics for each school. Example of what I am trying to do:
What I have tried:
egen NOREPORT = sum(enr_total) if ethnic==0, by(cds_code)
egen WHITE = sum(enr_total) if ethnic==1, by(cds_code)
egen ASIAN = sum(enr_total) if ethnic==2, by(cds_code)
egen PAC = sum(enr_total) if ethnic==3, by(cds_code)
egen AFR = sum(enr_total) if ethnic==4, by(cds_code)
egen HIS = sum(enr_total) if ethnic==5, by(cds_code)
egen TWOMORE = sum(enr_total) if ethnic==9, by(cds_code)
egen MALE = sum(enr_total) if gender=="M", by(cds_code)
egen FEMALE = sum(enr_total) if gender=="F", by(cds_code)
egen ENRTOT = sum(enr_tota), by(cds_code)
collapse (first) county district school (mean) NOREPORT ASIAN PAC HIS AFR WHITE TWOMORE MALE FEMALE ENRTOT, by(cds_code)
Problem:
After running the above code, the enrollment total does not equal to the actual total for each school. The numbers are much larger (no sure why).
Can anyone tell me if I am doing something wrong in collapsing the data or if there is another way I should collapse the data?
Thanks
I'm using STATA 14.2 (apologies in advance as I am using stata on a university server and cannot use dataex on the university version)
Below is an example of the data I have:
Unique Code (CDS) | District | School | Ethnicity | Gender | Total Enrollment | ||
100001 | ABC | XYZ | 1 | 1 | 4 | ||
100001 | ABC | XYZ | 2 | 1 | 5 | ||
100001 | ABC | XYZ | 4 | 1 | 3 | ||
100001 | ABC | XYZ | 4 | 2 | 4 | ||
100002 | ABC | PQR | 1 | 1 | 5 | ||
100002 | ABC | PQR | 1 | 2 | 3 | ||
100002 | ABC | PQR | 3 | 1 | 3 | ||
100002 | ABC | PQR | 3 | 2 | 8 | ||
100002 | ABC | PQR | 5 | 1 | 1 | ||
100002 | ABC | PQR | 5 | 2 | 5 |
CDS Code | Dist | Sch | Eth1 | Eth2 | Eth3 | Eth4 | Eth5 | Male | Fem | EnrTotal |
100001 | ABC | XYZ | 4 | 5 | 0 | 7 | 0 | 12 | 4 | 16 |
10002 | ABC | PQR | 8 | 0 | 11 | 0 | 6 | 9 | 16 | 25 |
What I have tried:
egen NOREPORT = sum(enr_total) if ethnic==0, by(cds_code)
egen WHITE = sum(enr_total) if ethnic==1, by(cds_code)
egen ASIAN = sum(enr_total) if ethnic==2, by(cds_code)
egen PAC = sum(enr_total) if ethnic==3, by(cds_code)
egen AFR = sum(enr_total) if ethnic==4, by(cds_code)
egen HIS = sum(enr_total) if ethnic==5, by(cds_code)
egen TWOMORE = sum(enr_total) if ethnic==9, by(cds_code)
egen MALE = sum(enr_total) if gender=="M", by(cds_code)
egen FEMALE = sum(enr_total) if gender=="F", by(cds_code)
egen ENRTOT = sum(enr_tota), by(cds_code)
collapse (first) county district school (mean) NOREPORT ASIAN PAC HIS AFR WHITE TWOMORE MALE FEMALE ENRTOT, by(cds_code)
Problem:
After running the above code, the enrollment total does not equal to the actual total for each school. The numbers are much larger (no sure why).
Can anyone tell me if I am doing something wrong in collapsing the data or if there is another way I should collapse the data?
Thanks
Comment