Drop observations if value of one variable is found in another variable of any observation in the dataset

Steven Buck

Join Date: Oct 2018

Posts: 2
#1

Drop observations if value of one variable is found in another variable of any observation in the dataset

15 Oct 2018, 05:05

Dear Statalist,

I have a dataset containing the information about the consolidated accounts of different firms. Each company is assigned a unique Company-ID in a string variable. In a second string variable, the owner of the respective firm is listed with his Company-ID. I would like to drop all observations, whose Owner-ID is found as the Company ID in another observation. I tried to find a way to do that with the help of "if", however, I didn't find a way to compare two variables across observations. Any help on this is highly appreciated.

Example:
Company-ID Revenue Owner-ID

A1 10 D4

B2 11 E5

C3 12 A1

In the example, the third observation should be dropped, because the owner-ID equals the Company-ID of the first observation.

Many thanks in advance
Steven

Last edited by Steven Buck; 15 Oct 2018, 05:07.
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10191

15 Oct 2018, 05:43

Please use dataex in the future to present data examples. This is essentially the same problem as the one in a recent thread

https://www.statalist.org/forums/for...in-the-dataset

Here is how to solve the problem using Nick Cox's suggestion #2 from the thread. For other approaches, refer to the link.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5(CompanyID Revenue OwnerID)
"A1" "10" "D4"
"B2" "11" "E5"
"C3" "12" "A1"
end

levelsof CompanyID, local(cid)
levelsof OwnerID, local(oid)
local diff: list cid - oid
local todrop: list cid- diff
local todrop2 : subinstr local todrop " " ",,, all
drop if inlist( OwnerID, `todrop2')

Result

Code:

. l

     +------------------------------+
     | Compan~D   Revenue   OwnerID |
     |------------------------------|
  1. |       A1        10        D4 |
  2. |       B2        11        E5 |
     +------------------------------+

Added in edit: Note that inlist() accepts up to eight arguments, so for a larger list, you will need the "|" separator.

Last edited by Andrew Musau; 15 Oct 2018, 05:55.

Comment

Steven Buck

Join Date: Oct 2018

Posts: 2
#3

15 Oct 2018, 07:48

Thanks for your swift response, Andrew. Unfortunately, when using the levelsof command, I receive the error that "macro substitution results in line that is too long". Your suggested method in the referenced thread, however, worked for me. Thank you very much for your help!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10191
#4

15 Oct 2018, 10:16

Glad to hear that you were able to solve your problem. There is a correction to the code in #2. The only reason that the following does not throw in an error is because there is only one element in the macro

local todrop2 : subinstr local todrop " " ",,, all

I will expand the data example to include more than 1 element

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str5(CompanyID Revenue OwnerID) "A1" "10" "D4" "B2" "11" "E5" "C3" "12" "A1" "A1" "10" "B2" end levelsof CompanyID, local(cid) levelsof OwnerID, local(oid) local diff: list cid - oid local todrop: list cid- diff di `"`todrop'"'

Displaying contents of the local macro

Code:

. di `"`todrop'"' `"A1"' `"B2"'

reveals that I need the following to separate elements by a comma

Code:

local todrop: subinstr local todrop `" "' `", "', all di `"`todrop'"'

resulting in

Code:

. di `"`todrop'"' `"A1"', `"B2"'

So finally, the following should work

Code:

drop if inlist(OwnerID, `todrop')
2 likes
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#5

15 Oct 2018, 23:39

Utilizing -levelsof- as in Andrew’s solution is very interesting, but going along with the -inlist-'s limitation, which is quite vulnerable to be touched for this kind of issue. Below is a concise way, which is safe from such limitation.

Code:

expand 2, gen(ex) replace OwnerID = CompanyID if ex bys OwnerID (ex): drop if ex[_N] drop ex

Then, Steve, what is your solution? Could you please share it out here? (By the way, for the future post, please use - dataex- to give a small sample of your data. Read more on FAQ, section 12.2.)
2 likes
Comment

Company-ID	Revenue	Owner-ID
A1	10	D4
B2	11	E5
C3	12	A1

Announcement