Hello, it is first time using Statalist. I am hoping to get advice on how to collapse data, which I have made some progress on but need to figure out a few last bits. I fear this may be easier than I am making it out to be, but I am in quite a mental pickle.
Currently my data looks something like this:
This is, as you can see, all for one teacher (same teacher ID) and each subject they have taught, by semester by school year. I have 4 years of data (2014, 2015, 2016, 2017). Essentially, the goal is to collapse this so that there is one row per teacher per year (four rows per teacher if they taught all 4 years we have data for).
The first step taken by the person who handed this off to me was to create dummy variables to capture all the varieties of classes within each subject (algebra, geometry, and stats all classify as "math", for example).
There is also the issue of semester: we want to collapse the two semesters into one, and also create some type of flag if they are in a different school in one semester than the other (you can see this happens in 2014 Spring in my example, where the School ID is 777).
So what I would ultimately want the data to look like is this, and I have just made up some results for 2016 and 2017 which I didn't include in the above table:
Here is what I have done so far, in the pursuit of this:
I am getting an error message for the middle code, however, where I am trying to collapse. And I don't think this will collapse quite enough anyway--I'm relatively new to the collapse command. I essentially wish to get it to be year by school (where they started the year) by teacher. The other significant problem I have yet to be able to solve is how to collapse the semesters into one year, while creating some dummy variable flag for if the School ID is different for the two semesters within each unique year.
Sorry for this massively long entry. Thank you in advance for your help.
Currently my data looks something like this:
Teacher ID | School ID | Year | Semester | Subject Taught | |
333 | 123 | 2014 | Fall | Geometry 1 | |
333 | 123 | 2014 | Fall | English Literature | |
333 | 123 | 2014 | Fall | Modern History | |
333 | 777 | 2014 | Spring | Modern History | |
333 | 123 | 2015 | Fall | Geometry | |
333 | 123 | 2015 | Fall | Homeroom | |
333 | 123 | 2015 | Spring | AP English | |
333 | 123 | 2015 | Spring | Homeroom |
The first step taken by the person who handed this off to me was to create dummy variables to capture all the varieties of classes within each subject (algebra, geometry, and stats all classify as "math", for example).
There is also the issue of semester: we want to collapse the two semesters into one, and also create some type of flag if they are in a different school in one semester than the other (you can see this happens in 2014 Spring in my example, where the School ID is 777).
So what I would ultimately want the data to look like is this, and I have just made up some results for 2016 and 2017 which I didn't include in the above table:
Teacher ID | School Year | Math | English | Science | History | Homeroom | School ID | Change schools? |
333 | 2014 | 1 | 1 | 0 | 1 | 0 | 123 | 1 |
333 | 2015 | 1 | 1 | 0 | 0 | 1 | 123 | 0 |
333 | 2016 | 0 | 0 | 0 | 1 | 1 | 123 | 0 |
333 | 2017 | 0 | 1 | 0 | 0 | 1 | 123 | 0 |
Code:
// local list of vars to use local varlist math english science history art homeroom // collapse to get teacher x schoolyear x school records collapse (sum) `varlist', by(teacherid schoolyear schoolid) // these are just flags, so anything greater than 0 should be 1 foreach var in `varlist' { replace `var' = 1 if `var' > 0 }
Sorry for this massively long entry. Thank you in advance for your help.
Comment