Using Concatenate

Erika Kociolek

Join Date: Apr 2014

Posts: 83
#1

Using Concatenate

25 Mar 2015, 12:53

Hi all,

I have a dataset that looks something like this:

ID Var1 Var2 Var3
1 ABC DEF GHI
2 XYZ
3 ABC XYZ

I'd like to create a new variable, concatenate, that looks like this:
ID concatenate
1 ABC, DEF, GHI
2 XYZ
3 ABC, XYZ

However, when I write the code

Code:

concat(Var*), punct(", ")

I get this:

ID concatenate
1 ABC, DEF, GHI,
2 XYZ,,,
3 ABC, XYZ,,

I'd like to find a way to avoid concatenating if the values of any Var variable are blank. Any suggestions?

Thanks,
Erika
Tags: None

Sergiy Radyakin

Join Date: Apr 2014
Posts: 1867

25 Mar 2015, 13:11

Code:

clear
input str20 s1 str20 s2 str20 s3
ABC DEF GHI
XYZ
ABC XYZ
end

list

gen c=""
foreach v of varlist s* {
  replace c=c+cond(missing(c),`v',cond(missing(`v'),"",","+`v'))
}

list

Produces:

Code:

     +-------------------------------+
     |  s1    s2    s3             c |
     |-------------------------------|
  1. | ABC   DEF   GHI   ABC,DEF,GHI |
  2. | XYZ                       XYZ |
  3. | ABC   XYZ             ABC,XYZ |
     +-------------------------------+

Nick's solution below implies that missings are always following non-missings. If this is not true, his solution will yield leading commas. My solution does not make this assumption, and will tolerate missings in any position.

Last edited by Sergiy Radyakin; 25 Mar 2015, 13:22.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35696

25 Mar 2015, 13:12

You should give the full code that you used. concat() requires the full syntax of an egen call.

I wrote the original version of concat() so I will atone for the lack of options to do what you want.

Code:

 
clear 
 
input ID str3 Var1 str3 Var2 str3 Var3
1 ABC DEF GHI
2 XYZ ""  ""
3 ABC XYZ ""
end 

gen wanted = Var1 

quietly forval j = 2/3 { 
     replace wanted = wanted + "," + Var`j' if Var`j' != "" 
} 

. list 

     +---------------------------------------+
     | ID   Var1   Var2   Var3        wanted |
     |---------------------------------------|
  1. |  1    ABC    DEF    GHI   ABC,DEF,GHI |
  2. |  2    XYZ                         XYZ |
  3. |  3    ABC    XYZ              ABC,XYZ |
     +---------------------------------------+

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#4

25 Mar 2015, 13:23

Or, you could keep your original approach and then patch it:

Code:

egen concatenate = concat((Var*), punct(", ") replace concatenate = regexr(concatenate, ",,+", ",")
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#5

25 Mar 2015, 14:19

Clyde,

1) your approach leaves leading commas and trailing commas if they are not multiples.

Code:

+--------------------------------------------------+ | ID s1 s2 s3 c concatenate | |--------------------------------------------------| 1. | 1 ABC DEF GHI ABC,DEF,GHI ABC,DEF,GHI | 2. | 2 XYZ XYZ ,XYZ, | 3. | 3 ABC XYZ ABC,XYZ ABC,XYZ, | +--------------------------------------------------+

2) It is not clear from Erika's email whether the content (XYZ, etc) may include commas or double commas. If so, regex may destroy the content.
3) parentheses unbalanced in first line - one extra opening brace.
4) As far as I understand, if you are specifying a space in the first statement, then the duplicate commas would be always separated by the space as well, and so the regex as written will never work (never find a match to replace). Regex solution is really nice and overcomes the simpler subinstr() solution's limitation, which only does one replacement at a time and needs to be called iteratively, while the number of changes is positive.

Best, Sergiy
Comment
Erika Kociolek

Join Date: Apr 2014

Posts: 83
#6

25 Mar 2015, 14:20

Thank you for the suggestions - very helpful.

Best,
Erika
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#7

25 Mar 2015, 14:35

Sergiy's point in #2 is quite correct. The context seems to imply that fields are occupied in order from first onwards. But as he pointed out the code in my #3 is not general.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment