Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing duplicates using if command

    Hi all.

    I´m asking for your help on how to drop duplicates using an if command. I have a dataset consisting of a lot of people who were interviewed multiple times over a ten year period. I want to remove all the respondants who at any given point in time marked themselves as "homemakers" (4) when asked for current job situation (CJS).

    I have tried to use the following code:

    duplicates drop mergeID if cjs==4

    stata responds the following:
    option if not allowed
    r(198);

    I hope someone is able to help me

    - Julia

  • #2
    So, what you're talking about is not an if "command," it is an if qualifier. Stata does have an -if- command, and it is different and does something different. See -help if- and -help ifcmd- for details.

    As Stata tells you, -duplicates drop- does not support -if- qualifiers. I'm interpreting what you wrote as meaning that you would like to drop observations that are duplicates on mergeID but only if the variable cjs takes the value 4. If cjs is different from 4, you want to retain duplicates. You can do this with a little extra work:

    Code:
    duplicates tag mergID, gen(flag)
    drop if flag & (cjs == 4)

    Comment


    • #3
      You can in general drop in terms of variables or in terms of observations but wanting to drop in terms of both is like asking to tear a rectangular block out of your data and Stata doesn't want to allow that. That is what you seem to seem to be asking, and you have boggled the tiny mind of duplicates.

      As your word description is

      all the respondants who at any given point in time marked themselves as "homemakers" (4) when asked for current job situation (CJS
      that sounds to me like

      Code:
      egen ever4 = max(cjs == 4), by(mergeID)
      drop if ever4 
      http://www.stata.com/support/faqs/da...ble-recording/ says much more.

      EDIT: This is close to Clyde (I'm Bonnie's alter ego) but picks up on the "at any given point in time" flavour.
      Last edited by Nick Cox; 12 Apr 2017, 08:43.

      Comment


      • #4
        Thanks for your answer Clyde!
        Your interpretation is correct.
        When I run your code, it only drops the observation where cjs==4 for that mergeid. If one of the mergeIDs got a cjs=4 I want it to delete all of the observations with that mergeID, if it makes sense? Thanks for your time.

        -Julia

        Comment


        • #5
          #3 is already an answer to #4.

          Comment


          • #6
            OK. That's a tad more complicated, but quite doable:

            Code:
            duplicates tag mergeID, gen(flag)
            by mergeID, sort: egen has_cjs_4 = max(cjs == 4)
            drop if flag & has_cjs_4
            Do read the manual section on -egen-. It is bristling with helpful functions and is a key to efficient data management in Stata.

            I think this thread also illustrates another point about posting on this forum. It is often difficult to explain clearly in words what is wanted. It is usually a good idea, in addition to explaining, to post some example data, and then hand-calculate the results you expect for the example, and then show those results.

            Comment


            • #7
              A huge thanks to both of you for the very usefull and quick answers!
              Your codes made it possible to end up with the result I wanted! I´ll try to make an example the next time I need either Bonnie or Clyde´s help
              - Julia

              Comment

              Working...
              X