Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen cat = group(var1 var2) where the order with which values appear in var1 or var2 does not matter for the created categories

    Dear All

    I am looking to create a new categorical variable based on two existing ones. Typically, this would be egen ... group. However, in this case, an adjustment is needed.

    These are the data:
    HTML Code:
    cat var1 var2
    1    ARG    CHL
    2    ARG    UKR
    3    ARG    USA
    4    CHL    ARG
    5    CHL    CHL
    6    CHL    USA
    7    UKR    ARG
    8    UKR    CHL
    9    UKR    USA
    10    USA    ARG
    11    USA    CHL
    12    USA    UKR
    and this is the code used
    Code:
    egen cat= group(var1 var2)
    cat yields the expected result. Now, I am looking for a way to ensure that the order of values in var1 and var2 does not matter, i.e. cat should be == 1 for both ARG & CHL in row 1, as well as for CHL & AGR in row 4.

    How can I accomplish this?

    Many thanks!

  • #2
    See https://journals.sagepub.com/doi/pdf...867X0800800414 for how to tackle this.

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte cat str3(var1 var2)
       1 "ARG" "CHL"
       2 "ARG" "UKR"
       3 "ARG" "USA"
       4 "CHL" "ARG"
       5 "CHL" "CHL"
       6 "CHL" "USA"
       7 "UKR" "ARG"
       8 "UKR" "CHL"
       9 "UKR" "USA"
      10 "USA" "ARG"
      11 "USA" "CHL"
      12 "USA" "UKR"
      end
      
      gen pair = cond(var1 < var2, var1+";"+var2, var2+";"+var1)
      egen wanted = group(pair)
      In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2.

      Comment


      • #4
        Thank you so much - that's brilliant!

        I'll do that in the future, thanks for the pointer, Clyde.

        Many thanks again!

        Comment

        Working...
        X