Hello,
I have panel data showing which schools students choose to attend and I would like to conduct a conditional logistic regression to see what kind of school characteristics are related to enrollment decisions. My data is currently in long format. It is similar to the hypothetical selection below which shows a student transferring schools in 2014. Each student is identified by student_id, schl_code identifies where that student enrolled, year is my time variable, and I have a number of variables that include school characteristics which are a mix of dummies (ex: magnet, which is an indicator for attending a magnet school) and continuous (ex: distance_schl, which is the distance in miles between the student's home and their enrolled school).
First, I have identified the criteria I would like to use for constructing students' choice sets. It is largely based on identifying schools within a certain distance (stud_lat stud_lon give the location of the centroid of each students residence zip code and schl_lat and schl_lon give school locations) and the student's home school district (identified by district variable). I will be limiting my sample to students who make transfers and will only be using data from the transfer year (some lagged), so in this example data from 2014.
However, I do not know how to actually restructure my data so that each choice (alts) is listed as a separate observation for the case (student_id). See below for the desired structure and where this student hypothetically had 4 schools to choose from. Also, some students, like those living in cities, will have a lot of schools in their choice sets whereas some may only have a few.
Any help on how to create this dataset would be greatly appreciated.
Thank you!
-Sophia
I have panel data showing which schools students choose to attend and I would like to conduct a conditional logistic regression to see what kind of school characteristics are related to enrollment decisions. My data is currently in long format. It is similar to the hypothetical selection below which shows a student transferring schools in 2014. Each student is identified by student_id, schl_code identifies where that student enrolled, year is my time variable, and I have a number of variables that include school characteristics which are a mix of dummies (ex: magnet, which is an indicator for attending a magnet school) and continuous (ex: distance_schl, which is the distance in miles between the student's home and their enrolled school).
student_id | Year | schl_code | district | distance_schl | magnet | schl_lat | schl_lon | stud_lat | stud_lon |
256534 | 2012 | 8576 | 6475839 | 3.22 | 0 | 39.9 | -72.2 | 39.8 | -76.6 |
256534 | 2013 | 8576 | 6475839 | 3.22 | 0 | 39.9 | -72.2 | 39.8 | -76.6 |
256534 | 2014 | 4040 | 6475839 | 2.16 | 1 | 40.7 | -75.3 | 39.8 | -76.6 |
256534 | 2015 | 4040 | 6475839 | 2.16 | 1 | 40.7 | -75.3 | 39.8 | -76.6 |
However, I do not know how to actually restructure my data so that each choice (alts) is listed as a separate observation for the case (student_id). See below for the desired structure and where this student hypothetically had 4 schools to choose from. Also, some students, like those living in cities, will have a lot of schools in their choice sets whereas some may only have a few.
student_id | alts | chose | distance_schl | magnet |
256534 | 9765 | 0 | 0.75 | 0 |
256534 | 8576 | 0 | 3.22 | 0 |
256534 | 4040 | 1 | 2.16 | 1 |
256534 | 3795 | 0 | 8.53 | 0 |
Thank you!
-Sophia
Comment