Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create non-directional pair IDs

    Dear Statalists,
    I would like to create non-directional pair ID for 2 variables var1 var2 :

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1(var1 var2)
    "A" "A"
    "A" "B"
    "A" "C"
    "B" "A"
    "B" "B"
    "B" "C"
    "C" "A"
    "C" "B"
    "C" "C"
    end
    With non-directional I mean ID for AB is the same as ID for BA. If I use
    Code:
    egen id = group(var1 var2)
    I will get different IDs for AB an BA, which is not what I want.
    Could someone please help me with this?
    Thank you very much!
    A

  • #2
    You can change your way to express this data.
    global Var="A B C D"
    foreach i of global Var {
    gen `i'=(var1=="`i'" | var2=="`i'")
    }
    gen id=group(A B C D)
    foreach i of global Var {
    drop `i'
    }

    Comment


    • #3
      This problem was discussed in


      Code:
      SJ-8-4  dm0043  . Tip 71: The problem of split identity, or how to group dyads
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
              Q4/08   SJ 8(4):588--591                                 (no commands)
              tip on how to handle dyadic identifiers
      http://www.stata-journal.com/sjpdf.h...iclenum=dm0043 takes you straight to the paper.


      Code:
      clear
      input str1(var1 var2)
      "A" "A"
      "A" "B"
      "A" "C"
      "B" "A"
      "B" "B"
      "B" "C"
      "C" "A"
      "C" "B"
      "C" "C"
      end
      
      gen newid = cond(var1 <= var2, var1, var2) + cond(var1 >= var2, var1, var2) 
      
      list 
      
           +---------------------+
           | var1   var2   newid |
           |---------------------|
        1. |    A      A      AA |
        2. |    A      B      AB |
        3. |    A      C      AC |
        4. |    B      A      AB |
        5. |    B      B      BB |
           |---------------------|
        6. |    B      C      BC |
        7. |    C      A      AC |
        8. |    C      B      BC |
        9. |    C      C      CC |
           +---------------------+

      Comment


      • #4
        Thank you very much for your prompt response !

        Comment


        • #5
          Dear Stata users,

          I tried to follow Nick Cox comment applying the following code:

          Code:
          gen pairid_s = cond(cnum_i <= cnum_n, cnum_i, cnum_n) + cond(cnum_i >= cnum_n, cnum_i, cnum_n)
          But without success because it creates the same id for different pairs.
          I suppose the problem comes from having a panel.
          I would like that each dyad has the same id in different years.
          My data looks like the following example:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int year long(cnum_i cnum_n)
          2002 47  19
          2003 47  19
          2004 47  19
          2002 19  47
          2003 19  47
          2004 19  47
          2002 60  47
          1998 60  19
          1999 19  60
          1999 55 110
          end
          (Being "cnum" country codes)

          Can someone help me?

          Best regards,
          Anthony

          Comment


          • #6
            The code there in #3 was recommended for strings for which + means concatenation. If you have numeric ids, as you do, it won't apply as the same sum could arise from different identifier pairs.

            Does this work for you?

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int year long(cnum_i cnum_n)
            2002 47  19
            2003 47  19
            2004 47  19
            2002 19  47
            2003 19  47
            2004 19  47
            2002 60  47
            1998 60  19
            1999 19  60
            1999 55 110
            end
            
            
            gen wanted = cond(cnum_i < cnum_n, string(cnum_i) + " " + string(cnum_n), string(cnum_n) + " " + string(cnum_i)) 
            
            list 
            
                +---------------------------------+
                 | year   cnum_i   cnum_n   wanted |
                 |---------------------------------|
              1. | 2002       47       19    19 47 |
              2. | 2003       47       19    19 47 |
              3. | 2004       47       19    19 47 |
              4. | 2002       19       47    19 47 |
              5. | 2003       19       47    19 47 |
                 |---------------------------------|
              6. | 2004       19       47    19 47 |
              7. | 2002       60       47    47 60 |
              8. | 1998       60       19    19 60 |
              9. | 1999       19       60    19 60 |
             10. | 1999       55      110   55 110 |
                 +---------------------------------+

            Comment


            • #7
              Thank you for the prompt response.
              It is not that yet, but almost.
              wanted should be a numeric variable. Using the example above:
              when wanted=="19 47" I would like to have wanted==1 ;
              when wanted=="47 60" I would like to have wanted==2 ;
              when wanted=="19 60" I would like to have wanted==3 ;
              when wanted=="55 110" I would like to have wanted==4.

              Comment


              • #8
                I can't see any trace of your asking that before. And if you wanted that why did you think that the code in #3 would do it for you?

                But, no matter. What you now need to explain why is why that order is the right order for those identifiers. If you can explain a rule that maps identifiers like that to 1 2 3 4, there will be code to match.

                Comment


                • #9
                  I will try to explain better.
                  I have 68754 trade flows with 160 exporters and 235 importers during 20 years.
                  I would like to create a pair ID between each pair exporter-importer.
                  But, for example, if the ID assigned is =1 when the exporter is France and the importer is Spain, I want the same ID=1 when the exporter is Spain and the importer is France.
                  Additionally, I want this pair ID to remain constant over the time. So, I would like to have the same pair ID for France-Spain (or Spain-France) in every year of the sample.

                  The best I was capable to do until now was:
                  Code:
                  egen pairid_a = group(cnum_i cnum_n)
                  However it creates a pair ID for France-Spain different than the pair ID for Spain-France.

                  Comment


                  • #10
                    Feeding my wanted from #6 to egen, group() will respect the pairing. It just will give you a different order from your example in #7.

                    Comment


                    • #11
                      I'm sorry to make you waste your time but I do not understand how should be the code.

                      Comment


                      • #12
                        Just using the data from post #6:

                        Code:
                        gen wanted = cond(cnum_i < cnum_n, string(cnum_i) + " " + string(cnum_n), string(cnum_n) + " " + string(cnum_i))
                        egen pairid_a = group(wanted)
                        
                        . list, sepby(pairid_a) noobs
                        
                          +--------------------------------------------+
                          | year   cnum_i   cnum_n   wanted   pairid_a |
                          |--------------------------------------------|
                          | 2002       47       19    19 47          1 |
                          | 2003       47       19    19 47          1 |
                          | 2004       47       19    19 47          1 |
                          | 2002       19       47    19 47          1 |
                          | 2003       19       47    19 47          1 |
                          | 2004       19       47    19 47          1 |
                          |--------------------------------------------|
                          | 2002       60       47    47 60          3 |
                          |--------------------------------------------|
                          | 1998       60       19    19 60          2 |
                          | 1999       19       60    19 60          2 |
                          |--------------------------------------------|
                          | 1999       55      110   55 110          4 |
                          +--------------------------------------------+

                        Comment


                        • #13
                          Thank you for your comment David.
                          With that example of data in #6 it works but with my entire data it does not create the same ID for France-Spain and Spain-France.
                          I will show you now a bigger example of my data in which that code does not work:

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input int year long(cnum_i cnum_n)
                          2001  50  76
                          2002  50  76
                          2003  50  76
                          2004  50  76
                          2001  50 107
                          2002  50 107
                          2003  50 107
                          2004  50 107
                          2001  50 172
                          2002  50 172
                          2003  50 172
                          2004  50 172
                          2001  55  70
                          2002  55  70
                          2003  55  70
                          2004  55  70
                          2001  55 107
                          2002  55 107
                          2003  55 107
                          2004  55 107
                          2001  55 172
                          2002  55 172
                          2003  55 172
                          2004  55 172
                          2001  77  70
                          2002  77  70
                          2003  77  70
                          2004  77  70
                          2001  77  76
                          2002  77  76
                          2003  77  76
                          2004  77  76
                          2001  77 172
                          2002  77 172
                          2003  77 172
                          2004  77 172
                          2001 122  70
                          2002 122  70
                          2003 122  70
                          2004 122  70
                          2001 122  76
                          2002 122  76
                          2003 122  76
                          2004 122  76
                          2001 122 107
                          2002 122 107
                          2003 122 107
                          2004 122 107
                          end
                          label values cnum_i cnum_i
                          label def cnum_i 50 "ESP", modify
                          label def cnum_i 55 "FRA", modify
                          label def cnum_i 77 "ITA", modify
                          label def cnum_i 122 "PRT", modify
                          label values cnum_n cnum_n
                          label def cnum_n 70 "ESP", modify
                          label def cnum_n 76 "FRA", modify
                          label def cnum_n 107 "ITA", modify
                          label def cnum_n 172 "PRT", modify
                          Code:
                          gen wanted = cond(cnum_i < cnum_n, string(cnum_i) + " " + string(cnum_n), string(cnum_n) + " " + string(cnum_i))
                          egen pairid_s = group(wanted)
                          list in 1/13
                          It is now possible to see that the pair ESP-FRA has not the same pairid_s (neither the same wanted) than the pair FRA-ESP.

                          Comment


                          • #14
                            So I found what was the problem.
                            I used the following code to create cnum* variables based on string variables with the ISO code of each country (iso_o for exporters and iso_d for importers).
                            Code:
                            encode iso_o, gen(cnum_i)
                            encode iso_d, gen(cnum_n)
                            However, that generates different ID for each country when it is an exporter or an importer.
                            I thought that creating the ID of each country using the following code would work:
                            Code:
                            egen cnum_n=group(iso_d)
                            egen cnum_i=group(iso_o)
                            But it does not work because I do not have the same number of exporters than importers. Each country has not always the same cnum_n and cnum_i.

                            Comment


                            • #15
                              multencode from SSC ensures consistent encoding of two or more variables.

                              Comment

                              Working...
                              X