Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Common elements between two lists of string variables

    Hi everyone,

    I am relatively new to Stata, so this is likely a basic question. I searched in Stata help file and the forum history, but couldn't find anything specific to string variables.

    I have two datasets, one for quantities and one for prices. They have overlapping countries, but are not exactly the same. Under each country, there are industry data, and the industries are also different in each dataset. The good news is the country variables in both datasets use the same code, say country in quantities data contains: ARG BRA CAN MEX USA, and country in prices data contains: ARG CAN JPN USA

    I would like to get a sense of how much overlap there is between the datasets. My thoughts are to start with the quantity dataset, and keep the values if a certain country is also in the prices dataset. Something like this:
    Code:
    sysuse quantity, clear
    local price_country ARG CAN JPN USA 
    keep if "the country is in the local price_country"
    What is the Stata commend that can achieve the last line "keep if the country is in the local price_country"?

    Appreciate your help.

  • #2
    Code:
    search inlist

    Comment


    • #3
      Try this:

      Code:
      use prices, clear
      levelsof country, local(price_country)
      
      use quantity, clear
      levelsof country, local(quantity_country)
      local common_country: list price_country & quantity_country
      keep if strpos(country, `"`common_country'"')
      Note, if you are eventually going to -merge- these two data sets, you don't need to go through this rigmarole. You can just specify the -keep(match)- option in your -merge country using...- command and only observations for countries common to both data sets will be retained.

      Comment


      • #4
        Thanks Clyde! This does exactly what I want to achieve. -merge- is tricky at the moment because sub-levels of industries are all different between the two datasets and I need to think harder on how I want to combine them. But your code gave exactly the information I am looking for.

        Nick's suggestion of -inlist- is very tempting, but I couldn't seem to get it working properly:
        Code:
        levelsof country, local(price_country)
        gen testing = inlist(country, `price_country')
        but the generated variable "testing" is all 0. What did I miss here?

        Comment


        • #5
          Originally posted by Wendy Lai View Post
          . . . the generated variable "testing" is all 0. What did I miss here?
          inlist() needs to have the elements of the list separated by commas, and the macro returned from levelsof, local() doesn't do that. Also, when used with strings, inlist() is limited to 10 elements to match. An alternative is
          Code:
          generate byte testing = 0
          quietly levelsof country, local(countries)
          foreach country of local countries {
              quietly replace testing = 1 if country == "`country'"
          }

          Comment


          • #6
            Thanks for the explanation Joseph. That really helped me understand the -inlist- command.

            Comment

            Working...
            X