Drop observations corresponding to other variable

Zain Mian

Join Date: May 2019

Posts: 38
#1

Drop observations corresponding to other variable

18 May 2019, 12:41

I have two variables, Names1 and Names2, both consisting of certain names. A simplified e.g. is pasted below:
Names1 Names2

AA CC

BB AA

CC HH

DD FF

EE GG

FF

GG

HH

I would like to drop the names in Names1 which are not in Names2. So in this e.g. I want to drop BB, DD, EE. Would appreciate if you someone could provide guidance on doing so.
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30065

18 May 2019, 12:52

I'm not sure exactly what you want to do here. It is not possible to simply "drop" the name in Names1. You can replace the name with missing value, or you can drop the entire observation. The code below assumes you want the latter.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2(names1 names2)
"AA" "CC"
"BB" "AA"
"CC" "HH"
"DD" "FF"
"EE" "GG"
"FF" ""  
"GG" ""  
"HH" ""  
end

preserve
keep names2
drop if missing(names2)
rename names2 names1
duplicates drop
tempfile keepers
save `keepers'

restore
merge m:1 names1 using `keepers', keep(match)

Comment

Zain Mian

Join Date: May 2019

Posts: 38
#3

20 May 2019, 04:27

Thank You for the code. However it does not quiet perform the required task (it drops observations from the second column as well). Let me re-phrase the problem again. I have two groups of companies, e.g. given below:
Group1 Group2

Apple Microsoft

DELL HP

Microsoft Nike

Nike Shell

Addidas Google

Kellogs Addidas

HP

Shell

Google

I only want to retain companies in Group 1 that are in Group 2 i.e. I want to drop companies; Apple, DELL and Kellogs from Group1 only since they are not in group 2 (don't want to change any companies in Group2). Would appreciate help/code for doing such filtering.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30065

20 May 2019, 09:49

OK, we had a miscommunication because you are using the word -drop- to mean something different from what -drop- means in Stata. In Stata, when you -drop- something you eliminate the entire observation. It sounds like what you want is to simply replace those with missing values.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2(names1 names2)
"AA" "CC"
"BB" "AA"
"CC" "HH"
"DD" "FF"
"EE" "GG"
"FF" ""  
"GG" ""  
"HH" ""  
end

preserve
keep names2
drop if missing(names2)
rename names2 names1
duplicates drop
tempfile keepers
save `keepers'

restore
merge m:1 names1 using `keepers'
replace names1 = "" if _merge != 3
list, noobs clean

Names1	Names2
AA	CC
BB	AA
CC	HH
DD	FF
EE	GG
FF
GG
HH

Group1	Group2
Apple	Microsoft
DELL	HP
Microsoft	Nike
Nike	Shell
Addidas	Google
Kellogs	Addidas
HP
Shell
Google

Announcement