Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to keep only Two way Matched data

    Hi Everyone,

    I am having trouble on keeping only the two-way matched data.This is the data I have were hhid_1 has some link with hhid_2, but I want to keep only those value which have two-way link that is values in hhid_2 which will eventually comes in hhid_1.
    So, would really appreciate if I would get some help on ways to keep the data that has two-way relationship.

    hhid_1 hhid_2
    10208019 10202001
    10208025 10202001
    10208047 10202001
    10208039 10202001
    10208006 10202001
    10208024 10202001
    10208029 10202001
    10208018 10202001
    10208004 10202004
    10208019 10202004
    10208010 10202004
    10208043 10202004
    10208017 10202004

  • #2
    I'm not sure I understand what you mean by a two-way link. Do you mean that if a particular pair appears as hhid_1 and hhid_2 in the data, then they also appear as hhid_2 and hhid_1, respectively, somewhere else in the data? If that's what you mean, and if hhid_1 and hhid_2 together uniquely identify observations in your data (as they do here)you can do this:

    Code:
    isid hhid_1 hhid_2
    assert hhid_1 != hhid_2
    gen long ordered_1 = min(hhid_1, hhid_2)
    gen long ordered_2 = max(hhid_1, hhid_2)
    by ordered_1 ordered_2, sort: keep if _N == 2
    I should point out that if that's what you want, the example you gave includes no examples of a two-way link.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Hello Dr. Schechter,

      Thank you for your response

      hhid_1 hhid_2 relation
      10208019 10202001 Acquaintance
      10208025 10202001 Acquaintance
      10208047 10202001 Acquaintance
      10208039 10202001 Acquaintance
      10208006 10202001 Familiar Face
      10208024 10202001 Acquaintance
      10208029 10202001 Acquaintance
      10208018 10202001 Neighbor
      10208004 10202004 Acquaintance
      10208019 10202004 Neighbor
      10202001 10208025 Acquaintance
      10202004 10208004 Acquaintance


      The situation here is I have a this data, where the hhid_1 knows hhid_2, but as I have highlighted hhid_2 also knows the hhid_1 as the last two observations shows. So, I want to only keep the data for the people who know each other (i.e Sam knows Harry, and Harry knows Sam). There are lots of cases where Sam knows Harry due to the familiar face he have seen around, but Harry does not know Sam, so these observations does not show two way relationships, and I dont want to keep these data.

      I hope I am exlpaining this time better and apologies for the bad example.

      It would be of a great help if I could be able to explain it to me. Thank you so much again for helping out a fellow researcher.

      From the next time I will be using -dataex though thank you for your suggestion.

      Best

      Comment


      • #4
        The code shown in #2 will do that and it produces precisely the two pairs you exhibited in bold face.

        The logic of the code is simple. The first two lines verify that the exact same pair hhid_1 and hhid_2 never appears twice in the data set (at least, not in that order), and then no hhid_1 is ever paired with itself.

        Then new variables ordered_1 and ordered_2 are created: they are the same as variables hhid_1 and hhid_2, except that they are ordered with hhid_1 being the one with the lower numeric value. Notice that in the case of a "two way relationship" these two variables, ordered_1 and ordered_2 will be identical in the two rows reflecting those relationships. Finally we group all the data by the paired values ordered_1 and ordered_2. Where there is a two way relationship, there are two such observations, but in a one-way relationship there is only one. We keep those with two. Done.

        Comment


        • #5
          Thank you fpr explaining it throughly, that really helped a lot. I got the data as shown below after the two-way match. rel_12 which shows relation between each other, here each pairs know each other. But is there any way I can remove those datas as I have highlighted, although they know each other they described each other differently, one said the other was my friend, while the other says he/she was just a familiar face. It is not relaible to keep this data, so is there any way I can remove those specific pairs whose answers does not match.

          Thank you again for your help. I really appreciate it.


          hhid_1 hhid_2 rel_12 long(ordered_1 ordered_2)
          10208003 10208001 3 10208001 10208003
          10208001 10208003 3 10208001 10208003
          10208004 10208001 3 10208001 10208004
          10208001 10208004 3 10208001 10208004
          10208001 10208005 4 10208001 10208005
          10208005 10208001 2 10208001 10208005
          10208001 10208006 5 10208001 10208006
          10208006 10208001 3 10208001 10208006
          10208007 10208001 5 10208001 10208007
          10208001 10208007 5 10208001 10208007
          10208008 10208001 3 10208001 10208008
          10208001 10208008 6 10208001 10208008
          10208001 10208009 6 10208001 10208009
          10208009 10208001 5 10208001 10208009
          10208001 10208010 6 10208001 10208010
          10208010 10208001 5 10208001 10208010
          10208001 10208012 6 10208001 10208012
          10208012 10208001 2 10208001 10208012
          10208016 10208001 3 10208001 10208016
          10208001 10208016 3 10208001 10208016
          10208001 10208017 2 10208001 10208017
          10208017 10208001 2 10208001 10208017
          10208018 10208001 5 10208001 10208018
          10208001 10208018 5 10208001 10208018
          10208019 10208001 5 10208001 10208019
          10208001 10208019 5 10208001 10208019
          10208001 10208021 5 10208001 10208021
          10208021 10208001 5 10208001 10208021
          10208022 10208001 5 10208001 10208022
          10208001 10208022 5 10208001 10208022
          10208001 10208023 5 10208001 10208023
          10208023 10208001 5 10208001 10208023
          10208029 10208001 5 10208001 10208029
          10208001 10208029 5 10208001 10208029
          10208031 10208001 5 10208001 10208031
          10208001 10208031 3 10208001 10208031
          10208034 10208001 6 10208001 10208034
          10208001 10208034 2 10208001 10208034
          10208001 10208037 5 10208001 10208037
          10208037 10208001 4 10208001 10208037
          10208039 10208001 5 10208001 10208039
          10208001 10208039 5 10208001 10208039
          10208040 10208001 5 10208001 10208040
          10208001 10208040 6 10208001 10208040
          10208042 10208001 5 10208001 10208042
          10208001 10208042 6 10208001 10208042
          10208001 10208047 5 10208001 10208047
          10208047 10208001 5 10208001 10208047
          10208001 10208050 5 10208001 10208050
          10208050 10208001 5 10208001 10208050
          10208003 10208002 5 10208002 10208003
          10208002 10208003 5 10208002 10208003
          10208002 10208004 3 10208002 10208004
          10208004 10208002 5 10208002 10208004
          10208005 10208002 5 10208002 10208005
          10208002 10208005 5 10208002 10208005
          10208002 10208006 5 10208002 10208006
          10208006 10208002 5 10208002 10208006
          10208007 10208002 4 10208002 10208007
          10208002 10208007 4 10208002 10208007
          10208002 10208008 2 10208002 10208008
          10208008 10208002 5 10208002 10208008
          10208009 10208002 4 10208002 10208009
          10208002 10208009 5 10208002 10208009
          10208010 10208002 3 10208002 10208010
          10208002 10208010 2 10208002 10208010
          10208011 10208002 5 10208002 10208011
          10208002 10208011 5 10208002 10208011
          10208002 10208012 5 10208002 10208012
          10208012 10208002 5 10208002 10208012
          10208002 10208013 5 10208002 10208013
          10208013 10208002 4 10208002 10208013
          10208016 10208002 5 10208002 10208016
          10208002 10208016 4 10208002 10208016
          10208017 10208002 4 10208002 10208017
          10208002 10208017 5 10208002 10208017
          10208002 10208020 5 10208002 10208020
          10208020 10208002 4 10208002 10208020
          10208002 10208021 5 10208002 10208021
          10208021 10208002 5 10208002 10208021
          10208002 10208022 5 10208002 10208022
          10208022 10208002 5 10208002 10208022
          10208002 10208023 5 10208002 10208023
          10208023 10208002 5 10208002 10208023
          10208002 10208024 3 10208002 10208024
          10208024 10208002 5 10208002 10208024
          10208025 10208002 3 10208002 10208025
          10208002 10208025 5 10208002 10208025
          10208027 10208002 5 10208002 10208027
          10208002 10208027 5 10208002 10208027
          10208030 10208002 6 10208002 10208030
          10208002 10208030 5 10208002 10208030
          10208002 10208031 5 10208002 10208031
          10208031 10208002 5 10208002 10208031
          10208034 10208002 6 10208002 10208034
          10208002 10208034 5 10208002 10208034
          10208002 10208037 5 10208002 10208037
          10208037 10208002 4 10208002 10208037
          10208039 10208002 5 10208002 10208039
          10208002 10208039 5 10208002 10208039
          end
          label values rel_12 relation
          label def relation 2 "Relative", modify
          label def relation 3 "Friend", modify
          label def relation 4 "Neighbor", modify
          label def relation 5 "Acquaintance", modify
          label def relation 6 "Familiar Face", modify
          [/CODE]

          Comment


          • #6
            I don't understand what you want to do here. The observations you show in bold face are by no means the only ones where the stated relationships disagree. It is easy to drop all pairs where the reported relationships disagree:
            Code:
            by ordered_1 ordered_2, sort: drop if rel_12[1] != rel_12[2]
            but that will remove far more than just the four bolded observations. What is it specifically about those four that distinguishes them from the other inconsistent relationship pairs?

            Comment


            • #7
              I was not clear above. I was meaning to remove all the stated relationship i.e inconsistent relationship pairs.
              I Just ran the code and it works perfectly fine Dr. Clyde, thank you for your help. I was struggling a lot with the process, I can now move forward.

              Thank you

              Comment

              Working...
              X