Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a variable based on values of another variable

    I am trying to generate a variable that gives an instructor ID for a student if that student took a particular course that term with that instructor, but there are a few aspects of the problem that make it more complicated than I had originally realized, and this is now beyond my Stata variable generation knowledge.

    It would be easy to do if each student only ever enrolled in one section of a course in a given term, and never switched sections; however, the problem arises when there is more than one section of the course in which the student was enrolled at some point in the term, so I have to generate this new instructor variable using a set of conditions that is more complicated than I have been able to figure out how to implement so far.

    Here is a simplified version of the problem. Here are the variables I have:
    --studentID
    --term
    --coursenumber
    --instructor
    --sortvariable (missing in some cases where I don't want an instructor name assigned to the new variable)
    --maxsortvariable (this is the maximum value of sortvariable for this studentID for this term for this coursenumber)

    I want to generate a new variable for each of five courses (let's say course numbers 100, 101, 102, etc.) called instructor100, instructor 101, etc. This process with be the same with each course, so let's just look at one example:

    We want to generate a new variable called instructor100 so that:
    For each case, if this student did not take coursenumber 100 in this term; or if sortvariable (and therefore maxsortvariable) is empty for all cases in which coursenumber==100 for this student in this term, instructor100 should be empty.
    Otherwise, instructor100 should be set to the value of instructor for which sortvariable has the maximum possible value when coursenumber==100 in this term for this student.
    If there are multiple cases where sortvariable has the maximum possible value when coursenumber==100 in this term for this student, then we just want one of the instructor values assigned at random to instructor100 (but instructor100 should have the same value for all cases with the same studentID and term). There are not a lot of cases of this, but if it happens, we need a way to deal with it.

    So, for example, something like this:
    studentID term coursenumber instructor sortvariable maxsortvariable instructor100
    1 1199 100 10 1 1.5 11
    1 1199 100 11 1.5 1.5 11
    1 1199 200 12 2 2 .
    1 1202 300 13 2.5 2.5 .
    1 1206 100 14 3 3 14
    1 1206 100 15 3 3 14
    1 1209 100 12 . . .
    For term 1206, instructor=14 has been chosen for the value of instructor100 at random, but it is the same random choice for ALL cases with the same studentID and term number (i.e., these either both need to be 14 or both need to be 15, but not one 14 and one 15). This random choice condition I can do through brute force at the end if necessary (by dropping duplicates by studentID term coursenumber and telling Stata to force drop even though data will be lost), so I can do that if it is necessary (or makes the variable generation easier to write out elegantly).

    Thanks in advance for any advice!


  • #2
    I think something like this should work:

    Code:
    clear
    input int(studentID    term    coursenumber    instructor)    float(sortvariable    maxsortvariable)    int(nstructor100example)
    1    1199    100    10    1    1.5    11
    1    1199    100    11    1.5    1.5    11
    1    1199    200    12    2    2    .
    1    1202    300    13    2.5    2.5    .
    1    1206    100    14    3    3    14
    1    1206    100    15    3    3    14
    1    1209    100    12    .    .    .
    end
    
    replace sortvariable = sortvariable + runiform(0, 0.09) // randomly distinguish matching sort values assuming values only go to tens place.
    replace sortvariable = -1 if sortvariable == . // assuming all sortvariable values are positive.
    levelsof coursenumber, local(courselevels)
    foreach course in `courselevels'{
        bysort studentID term coursenumber (sortvariable), sort: gen instructor`course' = instructor[_N] if coursenumber == `course' & sortvariable[_N] != -1
    }
    
    list instructor100 instructor200 instructor300
    Code:
    . list instructor100 instructor200 instructor300
    
         +--------------------------------+
         | inst~100   inst~200   inst~300 |
         |--------------------------------|
      1. |       11          .          . |
      2. |       11          .          . |
      3. |        .         12          . |
      4. |        .          .         13 |
      5. |       14          .          . |
         |--------------------------------|
      6. |       14          .          . |
      7. |        .          .          . |
         +--------------------------------+

    Comment


    • #3
      Thanks so much, Daniel Schaefer, this works beautifully!

      Comment

      Working...
      X