Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combine two string variables into one, such that both a + b and b + a become ab

    Hello,

    I am currently aiming to create an identification variable in order to identify the flight route of an observation. Therefore I would like to generate a variable combining two string variables (Origin and destination). Normally I could of course do this with

    gen citypair = origin_city + dest_city

    The problem with this is that it gives you the following result:
    origin_city des_city citypair
    NY WA NYWA
    WA NY WANY
    Both observations however service the same route, so I would like to identify both observations with an identical identifying citypair value.

    Any suggestions on how to do that?

    Thank you in advance,

    Frank

  • #2
    Frank:
    welcome to the list.
    I would take a look at -help egen-, -concat()- function.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you for your quick response!

      I ve read them, and tried to do it that way with the following command:

      egen citypairairline2 = concat(origin_city dest_city)

      It does however not solve the problem. Still, Stata recognizes the observations (origin_city=NY, dest_city=WA) and (origin_city=WA, dest_city=NY) as 2 unique observations. I also see no options in the egen concat() function that could help me with this.

      Any other ideas? Or am I missing something?

      Frank

      Comment


      • #4
        Hi Frank,

        Others may have better suggestions, but you could use reshape, as follows:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str2(origin_city des_city)
        "NY" "WA"
        "WA" "NY"
        "NY" "SF"
        "WA" "SF"
        "SF" "WA"
        "SF" "NY"
        end
        Code:
        gen obs = _n
        rename (origin_city des_city) (city1 city2)
        reshape long city, i(obs) j(old_order)
        bys obs (city): gen new_order = _n
        drop old_order
        reshape wide city, i(obs) j(new_order)
        rename (city1 city2) (origin_city des_city)
        gen citypair = origin_city + " " + des_city
        David.

        Comment


        • #5
          You can compare strings using relational operators. For greater than or less than operators, the sort order determines the results.
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str2(origin_city des_city)
          "NY" "WA"
          "WA" "NY"
          "NY" "SF"
          "WA" "SF"
          "SF" "WA"
          "SF" "NY"
          end
          
          gen route = cond(origin_city < des_city, origin_city + des_city, ///
                              des_city + origin_city)
                              
          sort route
          list, sepby(route)

          Comment


          • #6
            Thanks for the suggestions. It worked.

            Frank

            Comment


            • #7
              See http://www.stata-journal.com/sjpdf.h...iclenum=dm0043 for a systematic discussion enlarging on Robert Picard's solution.

              Comment

              Working...
              X