Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Many to many merge

    Everything I come across/ everyone I ask says NEVER do a m:m merge, but I'm not quite convinced for my particular case. I have student by course level data set--i.e., in a given year, a student has multiple courses. The data set is unique at the student ID plus 5 course-level variables (note: no teacher ID is included). The data set I want to merge to is the teacher-level course file, which includes the five course-levels variables that are also in the student-level file. Now, this would be a fairly clean m:1 merge, BUT there are often multiple teachers for a single course--e.g., course co-teachers--so I did many to many.

    I am still having trouble understanding why m:m merges are problematic because when I ran the merge, everything appears reasonable . . . could some explain why these merges generally cause problems/ what I should look out for to check to make sure the merge went successfully?

  • #2
    From [D] merge

    If the master and using have an unequal number of observations within the group, then the last observation of the shorter group is used repeatedly to match with subsequent observations of the longer group. Thus m:m merges are dependent on the current sort order—something which should never happen.
    From your (verbal) description, it is not entirely clear how your datasets are set up. For example, in the student dataset, I am having difficulties to see why the course-level variables would qualify as identifiers within a student (ID). I would intuitively think that student ID and year should uniquely identify observations.

    For better advice, post a small(!) subset of your datasets.

    Best
    Daniel

    Comment


    • #3
      When working with merges you really want to think very carefully through what you want the final data set to look like. For complicated situations it can be helpful to actually draw out how you want the records to link for a few existing cases in your data.

      Since you don't describe what you want your end data to look like we're kind of forced to guess but my guess is that if you have a student taking a course that has multiple teachers you want your end data to contain records for both of those teachers linked to that student. If that is the case then you should look at joinby.

      I suspect that what you're getting now is probably not really what you want. You are probably linking on courseid and have multiples in both files but with a m:m merge Stata has no idea how to match up those multiples. If you start with the student data and use a m:m merge to the course data on courseid you will end up with a situation where some students in a course with multiple teachers will match up to one teacher and some students will match up to another. This won't be random (so you can't claim you randomly selected a teacher) and you won't get an equal distribution of the teachers represented across students. It will happen in a way that is determined by sort order of the two datasets (and frequency of the observations in each group, etc.) but it will be very difficult to control which teacher links with which student for a given course.

      If you want to actually randomly select how teachers match to students when there are multiple teachers per course there are ways to do that systematically. What you are doing with a m:m merge is haphazard and sloppy.

      Look very closely at your matching data for a students in a course that has more than one teacher. How many students match to teacher A for that course? How many match to teacher B? Can you explain why they matched that way? If you resort your two data sets in a different way and then merge again do you get the same results? Do you know why? If you cannot very clearly answer those last 3 questions and explain exactly how the merge worked and produced what you want then a m:m merge is not for you and will give you results that are potentially not reproducible.

      Comment

      Working...
      X