Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Same value for duplicates

    Hello, statalist!

    I am struggling with the replace the value for the same person.
    In the dataset, there are two observations, but they are the same person.
    I do not want to drop duplicates but just replace the value which is shown as "SAME AS~" with the value.
    Since ID=1 and ID=2 is the same person, name, mothername, fathername, and gender is the same.
    But, they have different value in birthyear and birthmonth.

    I have no idea how to do it.
    Do I have to detect all of duplicates and replace it?

    So, I was trying

    code:
    duplicates list name mothername fathername gender


    but after then, I do not have any good idea to deal with this.

    I want ID=2 to have same value as ID=2 in birthyear and birthmonth because their name, mothername, fathername, and gender is the same (which means same person)

    So.. if two observations have the same value in name, mothername, fathername, and gender, then replace the birthyear and birthmonth with the same value as the other one.

    Is it fast to replace "SAME AS~" with the blank?

    ==========================================data==== ===

    ID name mothername fathername gender birthyear birthmonth
    1 XXX ABC DEF female "1991" "9"
    2 XXX ABC DEF female "SAME AS ID=2" "SAME AS ID=2"
    3 AAA EFG GED male "1980" "5"
    4 BBB FGH HBE female "1960" "3"
    .
    .
    .
    ================================================== ==


    I would greatly appreciate it if you give me any solution for this!
    Thank you in advance.




  • #2
    Well, let's look at ID1 and ID2 in your example. What is the reason to replace the values of birthyear and birthmonth in ID2 with the values in ID1, as opposed to replacing the values in ID1 with those in ID2. How do you know which birthyear and birthmonth are correct? The code to do what you want is easy enough to write, but based on what you have explained so far it seems like it is not a good thing to do.

    Comment


    • #3
      Thank you Clyde for your response!
      The reason why I replace the values without dropping the duplicates is that ID=1 and ID=2 belong to different group. (which means that they belong to two groups)
      There is actually one more variable group to make them different observations.
      So, I would like to replace "same as~" with the number (year or month).

      If ID1 belongs to group1 and ID2 belongs to group2, then I want ID1 to appear if I list group1 and ID2 to appear if I list group2.
      But if I list group2, there is ID2 with the value of "Same as~" in birthyear, instead of "(number)".
      I would like to fix this problem.

      Thank you so much.

      Comment


      • #4
        OK. l In the example you show, in ID2 it says "SAME AS ID=2", which of course is self-referential and does not identify an actual year. From your description, I assume you meant to say "SAME AS ID=1".

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte ID str3(name mothername fathername) str6 gender str12(birthyear birthmonth)
        1 "XXX" "ABC" "DEF" "female" "1991"         "9"           
        2 "XXX" "ABC" "DEF" "female" "SAME AS ID=1" "SAME AS ID=1"
        3 "AAA" "EFG" "GED" "male"   "1980"         "5"           
        4 "BBB" "FGH" "HBE" "female" "1960"         "3"           
        end
        
        foreach v of varlist birth* {
            split `v', parse("=") gen(`v'_)
            replace `v'_1 = trim(itrim(`v'_1))
            replace `v'_1 = "" if `v'_1 == "SAME AS ID"
            destring `v'_1, replace
            destring `v'_2, replace
            rangestat (first) `v'_1, interval(ID `v'_2 `v'_2)
            replace `v'_1 = `v'_1_first if missing(`v'_1)
            drop `v'_1_first `v'_2 `v'
            rename `v'_1 `v'
        }
        This code requires the -rangestat- command, by Robert Picard, Nick Cox, and Roberto Ferrer, available from SSC.

        In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data

        Comment

        Working...
        X