Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I've finally found the cause of the problem. I removed duplicates on 2 parts of my .do-file....1 removal should I do after a certain merge
    Last edited by LydiaSmit; 11 Jul 2014, 14:49.

    Comment


    • #17
      have you tried the -duplicates- command, or looked at the help file?

      Comment


      • #18
        Yes, I did Rich, however I used the 4 commands below to remove the duplicates in my datasets.

        sort id country gender + 8 other vars
        quietly by id country gender + 8 other vars: gen dup = cond(_N==1,0,_n)
        drop if dup>1
        drop dup

        I found these commands somewhere online. They work for the duplicate removal but they lead to a different total amount of obs each time I run the same .do-file. I need to remove the duplicates without getting mismatches when I use the 'cf'-command which compares a saved dataset with the dataset you currently have open. Therefore, I use the save command immediatly after the 'cf' command
        Last edited by LydiaSmit; 11 Jul 2014, 15:52.

        Comment


        • #19
          Why not duplicates drop??

          Code:
          sysuse auto, clear
          tab foreign
          duplicates drop foreign, force
          tab foreign
          (Only the duplicates command is needed, the rest is data loading and display).

          Best, Sergiy Radyakin

          Comment


          • #20
            Lydia, you say "yes", but you don't show anything using the "duplicates" command so I think the answer is really "no", you did not; see -help duplicates-

            I see that in #20 above, Sergiy has given an example of the use of the command
            Last edited by Rich Goldstein; 11 Jul 2014, 19:19.

            Comment


            • #21
              Thank you Sergiy Radyakin.

              Unfortunately I didn't think I would need a logfile (even though this is my first Stata project) ...otherwise I would have been able to prove that I used the -help duplicates- command a few weeks ago, when I started to use Stata for the first time. "have you tried the -duplicates- command, or looked at the help file?" So yes, I did look at the help file. So, the answer would have been yes even though I could have not tried the -duplicates- command. However, I did use that command a few weeks ago, but then I decided that I also wanted to drop the originals obs underlying the duplicates. (the first occurence) Therefore, I stopped using -duplicates drop- and started to use the following:

              sort id country gender
              quietly by id country gender: gen dup = cond(_N==1,0,_n)
              drop if dup>0;---which leads to a different result than duplicates drop (and dup >1 of course)
              drop dup

              Later, I changed my mind and wanted to keep the original obs, so then I used the 4 commands mentioned in my previous post because first nothing looked wrong after I used those commands. On the contrary, they seemed to work perfectly. "duplicates drop varlistnames, force" led to the same amount of dropped duplicates. I should have switched to -duplicates drop- at that time. However, the change in the total amount of obs didn't happen immediatly after running those 4 commands. They were the cause of it, but the change happened about 15 commands thereafter.

              Duplicates drop was and is the solution/answer to my question, however, I decided just before my post of 13:50 that I need to do that specific duplicates removal after a certain merge. After my previous post I checked duplicates drop again just to test if that would work without later changing the total amount of obs and it did work. However, I couldn't come online again until now.

              Deleted and reposted to prevent double posting:

              New problem, exact same topic

              sort id country gender
              quietly by id country gender: gen dup = cond(_N==1,0,_n)
              drop if dup>0;---which leads to a different result than duplicates drop (and dup >1 of course)
              drop dup

              Is it possible to drop the same amount of obs as with the above code but then with an other code?
              So a code that also drops the first occurence of duplicates?
              -duplicates drop- doesn't drop these.





              Comment


              • #22
                If I understand the question, -duplicates tag- may help you do what you want. A non-zero value on the generated variable indicates the record has a duplicate.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                Stata Version: 17.0 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #23
                  Thank you Richard. That was exactly what I needed.

                  Comment

                  Working...
                  X