I am trying to generate a variable that gives an instructor ID for a student if that student took a particular course that term with that instructor, but there are a few aspects of the problem that make it more complicated than I had originally realized, and this is now beyond my Stata variable generation knowledge.
It would be easy to do if each student only ever enrolled in one section of a course in a given term, and never switched sections; however, the problem arises when there is more than one section of the course in which the student was enrolled at some point in the term, so I have to generate this new instructor variable using a set of conditions that is more complicated than I have been able to figure out how to implement so far.
Here is a simplified version of the problem. Here are the variables I have:
--studentID
--term
--coursenumber
--instructor
--sortvariable (missing in some cases where I don't want an instructor name assigned to the new variable)
--maxsortvariable (this is the maximum value of sortvariable for this studentID for this term for this coursenumber)
I want to generate a new variable for each of five courses (let's say course numbers 100, 101, 102, etc.) called instructor100, instructor 101, etc. This process with be the same with each course, so let's just look at one example:
We want to generate a new variable called instructor100 so that:
For each case, if this student did not take coursenumber 100 in this term; or if sortvariable (and therefore maxsortvariable) is empty for all cases in which coursenumber==100 for this student in this term, instructor100 should be empty.
Otherwise, instructor100 should be set to the value of instructor for which sortvariable has the maximum possible value when coursenumber==100 in this term for this student.
If there are multiple cases where sortvariable has the maximum possible value when coursenumber==100 in this term for this student, then we just want one of the instructor values assigned at random to instructor100 (but instructor100 should have the same value for all cases with the same studentID and term). There are not a lot of cases of this, but if it happens, we need a way to deal with it.
So, for example, something like this:
For term 1206, instructor=14 has been chosen for the value of instructor100 at random, but it is the same random choice for ALL cases with the same studentID and term number (i.e., these either both need to be 14 or both need to be 15, but not one 14 and one 15). This random choice condition I can do through brute force at the end if necessary (by dropping duplicates by studentID term coursenumber and telling Stata to force drop even though data will be lost), so I can do that if it is necessary (or makes the variable generation easier to write out elegantly).
Thanks in advance for any advice!
It would be easy to do if each student only ever enrolled in one section of a course in a given term, and never switched sections; however, the problem arises when there is more than one section of the course in which the student was enrolled at some point in the term, so I have to generate this new instructor variable using a set of conditions that is more complicated than I have been able to figure out how to implement so far.
Here is a simplified version of the problem. Here are the variables I have:
--studentID
--term
--coursenumber
--instructor
--sortvariable (missing in some cases where I don't want an instructor name assigned to the new variable)
--maxsortvariable (this is the maximum value of sortvariable for this studentID for this term for this coursenumber)
I want to generate a new variable for each of five courses (let's say course numbers 100, 101, 102, etc.) called instructor100, instructor 101, etc. This process with be the same with each course, so let's just look at one example:
We want to generate a new variable called instructor100 so that:
For each case, if this student did not take coursenumber 100 in this term; or if sortvariable (and therefore maxsortvariable) is empty for all cases in which coursenumber==100 for this student in this term, instructor100 should be empty.
Otherwise, instructor100 should be set to the value of instructor for which sortvariable has the maximum possible value when coursenumber==100 in this term for this student.
If there are multiple cases where sortvariable has the maximum possible value when coursenumber==100 in this term for this student, then we just want one of the instructor values assigned at random to instructor100 (but instructor100 should have the same value for all cases with the same studentID and term). There are not a lot of cases of this, but if it happens, we need a way to deal with it.
So, for example, something like this:
studentID | term | coursenumber | instructor | sortvariable | maxsortvariable | instructor100 |
1 | 1199 | 100 | 10 | 1 | 1.5 | 11 |
1 | 1199 | 100 | 11 | 1.5 | 1.5 | 11 |
1 | 1199 | 200 | 12 | 2 | 2 | . |
1 | 1202 | 300 | 13 | 2.5 | 2.5 | . |
1 | 1206 | 100 | 14 | 3 | 3 | 14 |
1 | 1206 | 100 | 15 | 3 | 3 | 14 |
1 | 1209 | 100 | 12 | . | . | . |
Thanks in advance for any advice!
Comment