Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying order-specific occurences in panel data

    Hello all,

    I am trying to solve the following problem within stata. I am using data on student enrollments. It looks like this:
    course id student id term success flag
    math 1 47 1 1
    physics 1 48 1 1
    physics 2 48 1 0
    math 2 49 0 0
    physics 3 51 1 0
    math 4 52 1 0
    I am trying to identify which students took the math course before they took the physics course, such as student 1. They took math in term 47 and then subsequently took physics in term 48. On the other hand, student 2 took physics before math, so I would want to exclude them from the identifying flag.

    In short, I am trying to create the flag variable in the table above. The ultimate goal is to do a t-test to see if taking math before physics has an effect on success.

    Any help would be greatly appreciated.

    Kevin

  • #2
    In your example, every student has taken math or physics (and some both), and there are no other courses. I will not assume this is true in your full data set, so this code will still work with other courses thrown in, or with students who took neither course.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str7 courseid byte(studentid term)
    "math"    1 47
    "physics" 1 48
    "physics" 2 48
    "math"    2 49
    "physics" 3 51
    "math"    4 52
    end
    
    by studentid (term), sort: egen took_math = max(cond(courseid == "math", term, .))
    by studentid (term): egen took_physics = max(cond(courseid == "physics", term, .))
    gen byte flag = took_math < took_physics & !missing(took_physics, took_math)
    Note that in my demonstration data I have eliminated the variable success (because it plays no role in this calculation) and also the variable flag, which I generate in the code.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.


    Comment


    • #3
      Well, the analogous code in #2 was tested on our demonstration data and worked properly. I can think of a few possibilities:

      1. Conditioning on cb01 == "MATH-65A" or == "PHYS-30A" will fail if the variable cb01 is inconsistently coded with respect to upper/lower case, spacing, inclusion or omission of hyphen, etc. So make sure that is all handled.

      2. You are -by-ing on a variable newterm, but calculating took_math and took_physics using a different variable, term. The actual sorting on newterm/term is not important in this code, but I wonder if there is a confusion between two different variables and you meant to calculate took_math and took_physics from newterm instead of term.

      3. There is one circumstance where I realize my code is not robust: if a person takes a course more than once, the took_math or took_physics (as the case may be) will be set to the last time the person took the course. So that may be messing things up here. If that occurs in your data set and is causing problems, you need to specify how you want to handle the situation where a person takes the same course more than once.

      Comment

      Working...
      X