Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choice sets and data structure for conditional logistic regression?

    Hello,

    I have panel data showing which schools students choose to attend and I would like to conduct a conditional logistic regression to see what kind of school characteristics are related to enrollment decisions. My data is currently in long format. It is similar to the hypothetical selection below which shows a student transferring schools in 2014. Each student is identified by student_id, schl_code identifies where that student enrolled, year is my time variable, and I have a number of variables that include school characteristics which are a mix of dummies (ex: magnet, which is an indicator for attending a magnet school) and continuous (ex: distance_schl, which is the distance in miles between the student's home and their enrolled school).
    student_id Year schl_code district distance_schl magnet schl_lat schl_lon stud_lat stud_lon
    256534 2012 8576 6475839 3.22 0 39.9 -72.2 39.8 -76.6
    256534 2013 8576 6475839 3.22 0 39.9 -72.2 39.8 -76.6
    256534 2014 4040 6475839 2.16 1 40.7 -75.3 39.8 -76.6
    256534 2015 4040 6475839 2.16 1 40.7 -75.3 39.8 -76.6
    First, I have identified the criteria I would like to use for constructing students' choice sets. It is largely based on identifying schools within a certain distance (stud_lat stud_lon give the location of the centroid of each students residence zip code and schl_lat and schl_lon give school locations) and the student's home school district (identified by district variable). I will be limiting my sample to students who make transfers and will only be using data from the transfer year (some lagged), so in this example data from 2014.

    However, I do not know how to actually restructure my data so that each choice (alts) is listed as a separate observation for the case (student_id). See below for the desired structure and where this student hypothetically had 4 schools to choose from. Also, some students, like those living in cities, will have a lot of schools in their choice sets whereas some may only have a few.
    student_id alts chose distance_schl magnet
    256534 9765 0 0.75 0
    256534 8576 0 3.22 0
    256534 4040 1 2.16 1
    256534 3795 0 8.53 0
    Any help on how to create this dataset would be greatly appreciated.
    Thank you!
    -Sophia



    Last edited by Sophia Seifert; 21 Jun 2021, 11:39.

  • #2
    Either you're not describing your data correctly or you are seeking the impossible. In your desired results, schools 9765 and 3795 magically appear from nowhere. Unless somewhere in your data set there is other information about these schools that identifies them as possible choices for this student, there is no way to do what you ask. If there is such information, you need to show example data that includes that information so somebody can figure out how to bring it to bear on the solution to your problem.

    When posting back, please use the -dataex- program to show example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Clyde--thanks for your tip about dataex, I am new to these forums and have not heard of it before. To your point about the other school codes appearing, this is actually part of where I am stuck--I have these codes (as I will explain) but am not sure how to link them to the student_id variable as a choice set.

      My original data has all the students in a state and, along with them, the school codes for all the schools in the state. I have created a school-level dataset for each year by collapsing by schl_code--because schools open and close yearly, the list of schools and students' choice sets will also vary by year.

      The dataex example below is from 2017 school-level data and shows the schl_code, latitude, longitude, and a few example school variables: number of students enrolled in the school (schl_pop) and share of students in the school who receive English Language Learner services (ell). Distance from home and school district id (aun_loc) are the primary criteria for whether a school is an option for a particular student. Because distance calculations are based on the distance between a school's lat/long and the lat/long of the centroid of students' residence zip code, all the students who live in the same zip code, in the same school district, and are in the same grade level (indicated by dummies such as elem below) should have the exact same choice set. There are a few other variables that also influence inclusion in the choice set, such as an dummy for a type of online school available to all students in the state.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int schl_code float elem double(aun_loc lat_s longit_s) float(schl_pop ell)
      7607 0 101260303  39.88497199999986 -79.87193199999913  415           0
      6002 1 101260303   39.8482339999999  -79.9029190000001  244 .0040983604
      4921 1 101260303  39.80259599999996 -79.80469499999975  335 .0029850747
      7608 0 101260303 39.778341999999725  -79.9185099999997  404           0
      2129 1 101260303 39.884786000000105 -79.86332899999982  333  .003003003
      8364 1 101260303 39.832198000000055 -79.74447000000059  405           0
      4922 1 101260303  39.77978199999983 -79.91804699999963  213           0
      6001 0 101260303 39.827298999999556 -79.78491599999985 1057           0
      2154 0 101260803  40.00699300000041 -79.89329100000026  413  .004842615
      8384 1 101260803   40.0086889999994 -79.89454500000087  716  .002793296
      end

      Comment


      • #4
        I can point you in a direction, but as I have no experience working with spatial data in Stata, I can't bring you home.

        The command you will need to combine the school and student data sets is -cross-. Since the school data set is actually a series of data sets, one for each year, you will have to do that one at a time and then put all the results together. So it will look something like this:

        Code:
        use student_dataset, clear
        by student_id (year), sort: keep if schl_code != schl_code[_n-1] // KEEP ONLY YEARS WHEN STUDENT CHANGES SCHOOLS
        
        capture program drop one_year
        program define one_year
            local y = year[1]
            cross using school_dataset_for_year_`y'
            // HERE INSERT CODE TO REMOVE ALL OBSERVATIONS THAT PAIR A STUDENT
            // WITH A SCHOOL THAT IS NOT IN THE CHOICE SET
            exit
        end
        
        runby one_year, by(year)
        Notes:

        -runby- is written by Robert Picard and me, and is available from SSC. It is similar to using a -foreach- loop to iterate over the values of the variable year, but it is faster and also enables you to simplify the code by omitting -if- conditions on the year variable.

        I do not include any code that removes observations that are outside the student's choice set because 1) I don't grasp what all the specific criteria are, and 2) I don't know how to work with the longitude and latitude variables to calculate distance. Concerning the latter, if you are running version 16 or 17, there are a bunch of commands specifically for working with spatial data, and I imagine that therein you will find ways to resolve this part of your task. I have no experience with those commands, however, so I can't advise you more specifically. If you are using an older version of Stata, or if nothing there will suit your needs, I suggest you use Stata's -search- command to find user-written programs that might be what you need.

        Comment

        Working...
        X