Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New ID var based on combining two ID vars

    Hi all,

    I am working with panel data, students nested in classes. I want to cluster on class.
    Since I am interesting in certain effects measured on the class level, I want to take into account the students who stayed together in the same class in both year 1 and 2.
    There are two ID variables, classyear_1 and classyear_2. But the problem is that values in year 1 and year 2 have completely different ranges. See an example of how it looks below. As you can see there have been quite some students who switched class between year 1 and 2. That's okay, but I want to at least have the students that were in the same class for both years have the same values in a new variable.
    I think I'll have to do something with foreach, and then get the mimumum/maximum for each classyear_1, but I don't exactly know what I should do.
    I hope maybe someone could help me. Thanks in advance!

    id classyear_1 classyear_2
    1 1 434
    2 1 436
    3 1 436
    4 1 437
    5 1 437
    6 2 436
    7 2 437
    8 2 437
    9 6 332
    10 6 332
    11 7 332
    12 8 331
    13 8 331
    14 8 334

  • #2

    Code:
    egen group = group(classyear_1 classyear_2), label
    puts people who stayed together in the same group.

    See also https://journals.sagepub.com/doi/pdf...867X0800700407

    Comment


    • #3
      Thank you for your quick response Nick.

      This works and I can use the grouped variable as a cluster variable. However, for respondents who are only included in 1 year it creates a missing variable, logically.
      So for those respondensts, I tried to extract information from either year 1 or two by doing the following:

      Code:
      egen class_full = group(classyear_1 classyear_2), label
      replace class_full = classyear_1 if class_full==.
      However, the class_full variable now makes up random class_full values for those respondents. So the data looks like this. For id 138 it works fine, but 148 is incorrect:
      id classyear_1 classyear_2 class_full
      138 8 331 8 331
      148 9 . 3 435

      Is there any way to solve this?
      Last edited by Ellen vdBerk; 18 May 2022, 04:21.

      Comment


      • #4
        That's not to work well, because the point of group() is to create identifiers 1 up without regard to the values of the variables that are input.

        The help for egen indicates a missing option which treats missing values directly. So, I would throw out that variable and try again with the missing option also specified.

        But note that, other details aside, people missing on one class year will be lumped together with people also missing in the same class year, with naturally no guarantee that their unknown class years were really identical. Or that they were even in the school?

        Comment

        Working...
        X