Separating string variable with parenthesis and commas

kanika mahajan

Join Date: Oct 2021

Posts: 2
#1

Separating string variable with parenthesis and commas

12 Oct 2021, 00:44

Hello,

I have variable that looks like

Experience
('Santo Engineering', '2017-04-30', '2017-05-30', '')('Standard Drum and Bucket Factory', '2019-06-01', '2019-07-01', '')('Team Eclipse Racing', '2020-06-15', '2021-05-08', '')
('nA', '2021-07-01', '2021-07-02', '')
('One Hour Translation', '2021-09-08', '2021-09-14', '')
('Acc Cement', '1996-03-17', '', 30.0)
('Check N Click Learning and Technologies', '2021-07-18', '', 1.0)('No work any organisation ', '2021-07-18', '2021-07-18', '')

Each successive () represents the past experience details of a person, expressions within the parentheses contain: company, start date, end date, months
I would like to extract each string within parenthesis with first parenthesis getting variables named as e1_com, e1_sd, e1_ed, e1_m and if there are more than one parenthesis then the information contained in the second parenthesis should be stored in e2_com, e2_sd, e2_ed, e2_m and so on.....

Will appreciate any help on this. I think regular expressions should resolve this, but unable to figure this one out.

Thanks!
K
Tags: regex string extract

Andrew Musau

Join Date: Oct 2014
Posts: 10214

12 Oct 2021, 09:26

Welcome to Statalist. Your question would be easier to answer if you provided a data example using the dataex command. Please review FAQ Advice #12 for details on this. I would add "na" to missing values prior to splitting to ensure that there are 4 entries within each pair of parentheses. After splitting, the rest is a matter of renaming.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input strL experience
"('Santo Engineering', '2017-04-30', '2017-05-30', '')('Standard Drum and Bucket Factory', '2019-06-01', '2019-07-01', '')('Team Eclipse Racing', '2020-06-15', '2021-05-08', '')"
"('nA', '2021-07-01', '2021-07-02', '')"                                                                                                                                          
"('One Hour Translation', '2021-09-08', '2021-09-14', '')"                                                                                                                        
"('Acc Cement', '1996-03-17', '', 30.0)"                                                                                                                                          
"("                                                                                                                                                                              
end

replace experience= ustrregexra(experience, "[']{2}", "na,", .)
replace experience= ustrregexra(experience, "[,]+", ",", .)
replace experience= ustrregexra(experience, "[( ' )]", "", .)
split experience, p(,) g(e)
rename (e1-e12) (e1_com e1_sd e1_ed e1_m  e2_com e2_sd e2_ed e2_m e3_com e3_sd e3_ed e3_m)

With too many variables, you can write a loop to do the renaming.

Res.:

Code:

. l e1_com- e2_m

     +---------------------------------------------------------------------------------------------------------------------+
     |             e1_com        e1_sd        e1_ed   e1_m                         e2_com        e2_sd        e2_ed   e2_m |
     |---------------------------------------------------------------------------------------------------------------------|
  1. |   SantoEngineering   2017-04-30   2017-05-30     na   StandardDrumandBucketFactory   2019-06-01   2019-07-01     na |
  2. |                 nA   2021-07-01   2021-07-02     na                                                                 |
  3. | OneHourTranslation   2021-09-08   2021-09-14     na                                                                 |
  4. |          AccCement   1996-03-17           na   30.0                                                                 |
  5. |                                                                                                                     |
     +---------------------------------------------------------------------------------------------------------------------+

Comment

kanika mahajan

Join Date: Oct 2021

Posts: 2
#3

13 Oct 2021, 04:38

Dear Andrew,

This is such a clever way to think about the problem. The only thing is that spaces within company names also get eliminated by this line of the code. The below code however works:

replace experience= ustrregexra(experience, "[']{2}", "na,", .)
replace experience= ustrregexra(experience, "[,]+", ",", .)
replace experience= ustrregexra(experience, "[(')]", "", .)
replace experience= subinstr(experience, ".0", ".0 ,", .)

Thanks you so much!
Kanika

Last edited by kanika mahajan; 13 Oct 2021, 04:42.
Comment

Announcement

Separating string variable with parenthesis and commas

Comment

Comment