Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating string variable with parenthesis and commas

    Hello,

    I have variable that looks like

    Experience
    ('Santo Engineering', '2017-04-30', '2017-05-30', '')('Standard Drum and Bucket Factory', '2019-06-01', '2019-07-01', '')('Team Eclipse Racing', '2020-06-15', '2021-05-08', '')
    ('nA', '2021-07-01', '2021-07-02', '')
    ('One Hour Translation', '2021-09-08', '2021-09-14', '')
    ('Acc Cement', '1996-03-17', '', 30.0)
    ('Check N Click Learning and Technologies', '2021-07-18', '', 1.0)('No work any organisation ', '2021-07-18', '2021-07-18', '')

    Each successive () represents the past experience details of a person, expressions within the parentheses contain: company, start date, end date, months
    I would like to extract each string within parenthesis with first parenthesis getting variables named as e1_com, e1_sd, e1_ed, e1_m and if there are more than one parenthesis then the information contained in the second parenthesis should be stored in e2_com, e2_sd, e2_ed, e2_m and so on.....

    Will appreciate any help on this. I think regular expressions should resolve this, but unable to figure this one out.

    Thanks!
    K





  • #2
    Welcome to Statalist. Your question would be easier to answer if you provided a data example using the dataex command. Please review FAQ Advice #12 for details on this. I would add "na" to missing values prior to splitting to ensure that there are 4 entries within each pair of parentheses. After splitting, the rest is a matter of renaming.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input strL experience
    "('Santo Engineering', '2017-04-30', '2017-05-30', '')('Standard Drum and Bucket Factory', '2019-06-01', '2019-07-01', '')('Team Eclipse Racing', '2020-06-15', '2021-05-08', '')"
    "('nA', '2021-07-01', '2021-07-02', '')"                                                                                                                                          
    "('One Hour Translation', '2021-09-08', '2021-09-14', '')"                                                                                                                        
    "('Acc Cement', '1996-03-17', '', 30.0)"                                                                                                                                          
    "("                                                                                                                                                                              
    end
    
    replace experience= ustrregexra(experience, "[']{2}", "na,", .)
    replace experience= ustrregexra(experience, "[,]+", ",", .)
    replace experience= ustrregexra(experience, "[( ' )]", "", .)
    split experience, p(,) g(e)
    rename (e1-e12) (e1_com e1_sd e1_ed e1_m  e2_com e2_sd e2_ed e2_m e3_com e3_sd e3_ed e3_m)

    With too many variables, you can write a loop to do the renaming.

    Res.:

    Code:
    . l e1_com- e2_m
    
         +---------------------------------------------------------------------------------------------------------------------+
         |             e1_com        e1_sd        e1_ed   e1_m                         e2_com        e2_sd        e2_ed   e2_m |
         |---------------------------------------------------------------------------------------------------------------------|
      1. |   SantoEngineering   2017-04-30   2017-05-30     na   StandardDrumandBucketFactory   2019-06-01   2019-07-01     na |
      2. |                 nA   2021-07-01   2021-07-02     na                                                                 |
      3. | OneHourTranslation   2021-09-08   2021-09-14     na                                                                 |
      4. |          AccCement   1996-03-17           na   30.0                                                                 |
      5. |                                                                                                                     |
         +---------------------------------------------------------------------------------------------------------------------+

    Comment


    • #3
      Dear Andrew,

      This is such a clever way to think about the problem. The only thing is that spaces within company names also get eliminated by this line of the code. The below code however works:


      replace experience= ustrregexra(experience, "[']{2}", "na,", .)
      replace experience= ustrregexra(experience, "[,]+", ",", .)
      replace experience= ustrregexra(experience, "[(')]", "", .)
      replace experience= subinstr(experience, ".0", ".0 ,", .)

      Thanks you so much!
      Kanika
      Last edited by kanika mahajan; 13 Oct 2021, 04:42.

      Comment

      Working...
      X