Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding common values from a variable corresponding to two different values in another variable

    Hello,

    I'm interested in finding common values from a variable corresponding to two different values in another variable. For example, given two variables country_name and product_export_id as follows:
    country_name product_export_id
    USA 1001
    USA 1201
    China 1230
    China 1001
    I am interested in knowing the common product_export_id for USA and China and how many they are. Something like USA and China export only 1 product that is the same and what that product's product_export_id is.

    Can anyone please help me do this in stata? I later want to be able to create an index using this information.
    Any help will be much appreciated.

    Thank you!














    Last edited by Ale Asad; 11 Oct 2019, 08:44.

  • #2
    Welcome to Statalist.

    Perhaps this example will start you in a useful direction.
    Code:
    clear
    input str5 country_name int product_export_id
    "USA"   1001
    "USA"   1201
    "China" 1230
    "China" 1001
    end
    sort product_export_id country_name
    by product_export_id: generate common = _N==2
    list, clean abbreviate(20)
    Code:
    . list, clean abbreviate(20)
    
           country_name   product_export_id   common  
      1.          China                1001        1  
      2.            USA                1001        1  
      3.            USA                1201        0  
      4.          China                1230        0

    Comment


    • #3
      Thank you for your response William Lisowski . Unfortunately, the data I am working with has a lot of different countries and id's that I cannot really input. Is there any other way to get around it?

      Comment


      • #4
        I do not understand your question in post #3 about inputing your data.

        Perhaps you have yet to read the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. If not, you should take a few mintues to review it. Note especially sections 9-12 on how to best pose your question. You'll learn that the dataex command is how we provide sample data, as described in section 12 of the FAQ. I needed to read into the Stata the sample data you gave in post #1, and that is how I presented it.

        With that said, it seems as if you are saying that you have the possibility of more than just two countries in your data. In that case, perhaps the following is will start you in a useful direction.
        Code:
        // create example data
        cls
        clear
        input str5 country_name int product_export_id
        "USA"   1001
        "USA"   1201
        "China" 1230
        "China" 1001
        "Italy" 1230
        "Italy" 1001
        end
        
        // do the work
        rename country_name country_name1
        save set1, replace
        
        rename country_name1 country_name2
        save set2, replace
        
        use set1, clear
        joinby product_export_id using set2
        drop if country_name1 == country_name2
        sort country_name1 country_name2 product_export_id
        list, sepby(country_name1 country_name2) abbreviate(20) noobs
        Code:
        . list, sepby(country_name1 country_name2) abbreviate(20) noobs
        
          +---------------------------------------------------+
          | country_name1   product_export_id   country_name2 |
          |---------------------------------------------------|
          |         China                1001           Italy |
          |         China                1230           Italy |
          |---------------------------------------------------|
          |         China                1001             USA |
          |---------------------------------------------------|
          |         Italy                1001           China |
          |         Italy                1230           China |
          |---------------------------------------------------|
          |         Italy                1001             USA |
          |---------------------------------------------------|
          |           USA                1001           China |
          |---------------------------------------------------|
          |           USA                1001           Italy |
          +---------------------------------------------------+

        Comment


        • #5
          If you read the Forum FAQ, as all Forum members are requested to do before posting, you would have recognized that the beginning part of the code in #2 that says
          Code:
          clear
          input str5 country_name int product_export_id
          "USA" 1001
          "USA" 1201
          "China" 1230
          "China" 1001
          end
          is an abbreviated version of -dataex- output and that -dataex- is the preferred way of showing example data here on the Forum. It is not intended that you will actually create your real data set in this way. It's just a highly efficient way of creating an example data set to illustrate some code.

          So you should disregard those lines in the code in #2 and instead -use- your actual data set. Then run the rest of the code from #2
          Code:
          sort product_export_id country_name
          by product_export_id: generate common = _N==2
          list, clean abbreviate(20)
          Now, if your real data set is very large, the -list- output will go on for a long time and may not be useful to you. Also, it may be that there are some products that are exported by more than just two countries, and these groupings would not be identified by the code from #2. So another approach is:
          Code:
          by product_export_id (country_name), sort: generate common = _N > 1
          browse product_export_id country_name if common

          Added: Crossed with #4.

          Comment

          Working...
          X