Removing duplicates using if command

Julia Elisabeth

Join Date: Apr 2017

Posts: 3
#1

Removing duplicates using if command

12 Apr 2017, 08:29

Hi all.

I´m asking for your help on how to drop duplicates using an if command. I have a dataset consisting of a lot of people who were interviewed multiple times over a ten year period. I want to remove all the respondants who at any given point in time marked themselves as "homemakers" (4) when asked for current job situation (CJS).

I have tried to use the following code:

duplicates drop mergeID if cjs==4

stata responds the following:
option if not allowed
r(198);

I hope someone is able to help me

- Julia
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

12 Apr 2017, 08:38

So, what you're talking about is not an if "command," it is an if qualifier. Stata does have an -if- command, and it is different and does something different. See -help if- and -help ifcmd- for details.

As Stata tells you, -duplicates drop- does not support -if- qualifiers. I'm interpreting what you wrote as meaning that you would like to drop observations that are duplicates on mergeID but only if the variable cjs takes the value 4. If cjs is different from 4, you want to retain duplicates. You can do this with a little extra work:

Code:

duplicates tag mergID, gen(flag) drop if flag & (cjs == 4)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

12 Apr 2017, 08:40

You can in general drop in terms of variables or in terms of observations but wanting to drop in terms of both is like asking to tear a rectangular block out of your data and Stata doesn't want to allow that. That is what you seem to seem to be asking, and you have boggled the tiny mind of duplicates.

As your word description is

all the respondants who at any given point in time marked themselves as "homemakers" (4) when asked for current job situation (CJS

that sounds to me like

Code:

egen ever4 = max(cjs == 4), by(mergeID) drop if ever4

http://www.stata.com/support/faqs/da...ble-recording/ says much more.

EDIT: This is close to Clyde (I'm Bonnie's alter ego) but picks up on the "at any given point in time" flavour.

Last edited by Nick Cox; 12 Apr 2017, 08:43.
Comment
Julia Elisabeth

Join Date: Apr 2017

Posts: 3
#4

12 Apr 2017, 09:03

Thanks for your answer Clyde!
Your interpretation is correct.
When I run your code, it only drops the observation where cjs==4 for that mergeid. If one of the mergeIDs got a cjs=4 I want it to delete all of the observations with that mergeID, if it makes sense? Thanks for your time.

-Julia
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

12 Apr 2017, 09:08

#3 is already an answer to #4.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

12 Apr 2017, 09:10

OK. That's a tad more complicated, but quite doable:

Code:

duplicates tag mergeID, gen(flag) by mergeID, sort: egen has_cjs_4 = max(cjs == 4) drop if flag & has_cjs_4

Do read the manual section on -egen-. It is bristling with helpful functions and is a key to efficient data management in Stata.

I think this thread also illustrates another point about posting on this forum. It is often difficult to explain clearly in words what is wanted. It is usually a good idea, in addition to explaining, to post some example data, and then hand-calculate the results you expect for the example, and then show those results.
Comment
Julia Elisabeth

Join Date: Apr 2017

Posts: 3
#7

12 Apr 2017, 09:21

A huge thanks to both of you for the very usefull and quick answers!
Your codes made it possible to end up with the result I wanted! I´ll try to make an example the next time I need either Bonnie or Clyde´s help
- Julia
Comment

Announcement

Removing duplicates using if command

Comment

Comment

Comment

Comment

Comment

Comment