Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Concatenate

    Hi all,

    I have a dataset that looks something like this:

    ID Var1 Var2 Var3
    1 ABC DEF GHI
    2 XYZ
    3 ABC XYZ

    I'd like to create a new variable, concatenate, that looks like this:
    ID concatenate
    1 ABC, DEF, GHI
    2 XYZ
    3 ABC, XYZ

    However, when I write the code
    Code:
    concat(Var*), punct(", ")
    I get this:

    ID concatenate
    1 ABC, DEF, GHI,
    2 XYZ,,,
    3 ABC, XYZ,,

    I'd like to find a way to avoid concatenating if the values of any Var variable are blank. Any suggestions?

    Thanks,
    Erika

  • #2
    Code:
    clear
    input str20 s1 str20 s2 str20 s3
    ABC DEF GHI
    XYZ
    ABC XYZ
    end
    
    list
    
    gen c=""
    foreach v of varlist s* {
      replace c=c+cond(missing(c),`v',cond(missing(`v'),"",","+`v'))
    }
    
    list
    Produces:
    Code:
         +-------------------------------+
         |  s1    s2    s3             c |
         |-------------------------------|
      1. | ABC   DEF   GHI   ABC,DEF,GHI |
      2. | XYZ                       XYZ |
      3. | ABC   XYZ             ABC,XYZ |
         +-------------------------------+
    Nick's solution below implies that missings are always following non-missings. If this is not true, his solution will yield leading commas. My solution does not make this assumption, and will tolerate missings in any position.
    Last edited by Sergiy Radyakin; 25 Mar 2015, 13:22.

    Comment


    • #3
      You should give the full code that you used. concat() requires the full syntax of an egen call.

      I wrote the original version of concat() so I will atone for the lack of options to do what you want.

      Code:
       
      clear 
       
      input ID str3 Var1 str3 Var2 str3 Var3
      1 ABC DEF GHI
      2 XYZ ""  ""
      3 ABC XYZ ""
      end 
      
      gen wanted = Var1 
      
      quietly forval j = 2/3 { 
           replace wanted = wanted + "," + Var`j' if Var`j' != "" 
      } 
      
      . list 
      
           +---------------------------------------+
           | ID   Var1   Var2   Var3        wanted |
           |---------------------------------------|
        1. |  1    ABC    DEF    GHI   ABC,DEF,GHI |
        2. |  2    XYZ                         XYZ |
        3. |  3    ABC    XYZ              ABC,XYZ |
           +---------------------------------------+

      Comment


      • #4
        Or, you could keep your original approach and then patch it:

        Code:
        egen concatenate = concat((Var*), punct(", ")
        replace concatenate = regexr(concatenate, ",,+", ",")

        Comment


        • #5
          Clyde,

          1) your approach leaves leading commas and trailing commas if they are not multiples.
          Code:
               +--------------------------------------------------+
               | ID    s1    s2    s3             c   concatenate |
               |--------------------------------------------------|
            1. |  1   ABC   DEF   GHI   ABC,DEF,GHI   ABC,DEF,GHI |
            2. |  2         XYZ                 XYZ         ,XYZ, |
            3. |  3   ABC   XYZ             ABC,XYZ      ABC,XYZ, |
               +--------------------------------------------------+
          2) It is not clear from Erika's email whether the content (XYZ, etc) may include commas or double commas. If so, regex may destroy the content.
          3) parentheses unbalanced in first line - one extra opening brace.
          4) As far as I understand, if you are specifying a space in the first statement, then the duplicate commas would be always separated by the space as well, and so the regex as written will never work (never find a match to replace). Regex solution is really nice and overcomes the simpler subinstr() solution's limitation, which only does one replacement at a time and needs to be called iteratively, while the number of changes is positive.

          Best, Sergiy

          Comment


          • #6
            Thank you for the suggestions - very helpful.

            Best,
            Erika

            Comment


            • #7
              Sergiy's point in #2 is quite correct. The context seems to imply that fields are occupied in order from first onwards. But as he pointed out the code in my #3 is not general.

              Comment

              Working...
              X