Common elements between two lists of string variables

Wendy Lai

Join Date: May 2016

Posts: 6
#1

Common elements between two lists of string variables

27 May 2016, 11:49

Hi everyone,

I am relatively new to Stata, so this is likely a basic question. I searched in Stata help file and the forum history, but couldn't find anything specific to string variables.

I have two datasets, one for quantities and one for prices. They have overlapping countries, but are not exactly the same. Under each country, there are industry data, and the industries are also different in each dataset. The good news is the country variables in both datasets use the same code, say country in quantities data contains: ARG BRA CAN MEX USA, and country in prices data contains: ARG CAN JPN USA

I would like to get a sense of how much overlap there is between the datasets. My thoughts are to start with the quantity dataset, and keep the values if a certain country is also in the prices dataset. Something like this:

Code:

sysuse quantity, clear local price_country ARG CAN JPN USA keep if "the country is in the local price_country"

What is the Stata commend that can achieve the last line "keep if the country is in the local price_country"?

Appreciate your help.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35708
#2

27 May 2016, 12:14

Code:

search inlist
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#3

27 May 2016, 12:17

Try this:

Code:

use prices, clear levelsof country, local(price_country) use quantity, clear levelsof country, local(quantity_country) local common_country: list price_country & quantity_country keep if strpos(country, `"`common_country'"')

Note, if you are eventually going to -merge- these two data sets, you don't need to go through this rigmarole. You can just specify the -keep(match)- option in your -merge country using...- command and only observations for countries common to both data sets will be retained.
1 like
Comment
Wendy Lai

Join Date: May 2016

Posts: 6
#4

27 May 2016, 14:31

Thanks Clyde! This does exactly what I want to achieve. -merge- is tricky at the moment because sub-levels of industries are all different between the two datasets and I need to think harder on how I want to combine them. But your code gave exactly the information I am looking for.

Nick's suggestion of -inlist- is very tempting, but I couldn't seem to get it working properly:

Code:

levelsof country, local(price_country) gen testing = inlist(country, `price_country')

but the generated variable "testing" is all 0. What did I miss here?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#5

27 May 2016, 19:39

Originally posted by Wendy Lai View Post

. . . the generated variable "testing" is all 0. What did I miss here?

inlist() needs to have the elements of the list separated by commas, and the macro returned from levelsof, local() doesn't do that. Also, when used with strings, inlist() is limited to 10 elements to match. An alternative is

Code:

generate byte testing = 0 quietly levelsof country, local(countries) foreach country of local countries { quietly replace testing = 1 if country == "`country'" }
Comment
Wendy Lai

Join Date: May 2016

Posts: 6
#6

28 May 2016, 13:12

Thanks for the explanation Joseph. That really helped me understand the -inlist- command.
Comment

Announcement

Common elements between two lists of string variables

Comment

Comment

Comment

Comment

Comment