Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • splitting names under various conditions

    Hi Stata community,

    I am struggeling with splitting names under various conditions.
    I have a name variable which I want to split into a mainname and a subname.
    The classification is as follows: the part after the last comma is the subname but only if this part does not include "#".
    In case there is a "#" after the last comma and there are two words after the "#", the part after the "#" presents the subname.
    If there are more or less than two words, the conclusion is that there is no subname at all.
    The mainname should be basically the inital name without the subname. If there is no subname, the mainname is the whole name.
    So far I tried to work with split, parse but I couldn't allow for all the conditions with this command.

    All help is much appreciated.

    Many thanks,

    Sandra

  • #2
    I am familiar with split but am too stupid to hold all your rules in my head at once. Please give real(istic) examples of possible cases.

    Comment


    • #3
      Hi Nick,
      I have the following names:

      AMC, Concord #90 shp

      AMC Pacer Buick Riviera, 150 shp

      Chev., Malibu #90 shp
      Chev. Monte Carlo
      Ford Mustang, 90 shp
      AMC, #NA

      So what I am interested in is to get the info about shp in the subname section, however, this info occurs under different circumstances.


      Either the info is directly after a comma or after the # subsequently to a comma, however it might be that there is wrong Information after the # (identifiable if there are not two words).
      Last edited by Sandra Miller; 03 Jun 2019, 12:43.

      Comment


      • #4
        Code:
        split
        allows alternative parsing characters. That may help. Otherwise, sorry, but I am not finding this clearer.

        Comment


        • #5
          The classification is as follows: the part after the last comma is the subname but only if this part does not include "#".
          In case there is a "#" after the last comma and there are two words after the "#", the part after the "#" presents the subname.
          If there are more or less than two words, the conclusion is that there is no subname at all.
          The mainname should be basically the inital name without the subname. If there is no subname, the mainname is the whole name.
          Maybe the following will get you on the way.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str199 name
          "AMC, Concord #90 shp"            
          "AMC Pacer Buick Riviera, 150 shp"
          "Chev., Malibu #90 shp"           
          "Chev. Monte Carlo"               
          "Ford Mustang, 90 shp"            
          "AMC, #NA"                        
          end
          
          *KEEP FULL TEXT AFTER LAST COMMA
          gen subname = regexs(1) if regexm(name,",([^,]+$)")
          *DELETE TEXT PRECEDED BY A HASH AND NOT CONSISTING OF 2 WORDS
          replace subname= "" if strpos(subname, "#") & wordcount(substr(subname, strpos(subname, "#") + 1, .))!=2
          *KEEP TEXT FOLLOWING HASH
          replace subname= substr(subname, strpos(subname, "#") + 1, .) if strpos(subname, "#")  
          *GENERATE MAIN NAME, REMOVING COMMA AND HASH CHARACTERS
          gen mainname= subinstr(subinstr(subinstr(name,subname,"",.), ",", "", .), "#", "",.)

          Result:

          Code:
          . l, sep(6) noobs
          
            +-----------------------------------------------------------------------+
            |                             name    subname                  mainname |
            |-----------------------------------------------------------------------|
            |             AMC, Concord #90 shp     90 shp              AMC Concord  |
            | AMC Pacer Buick Riviera, 150 shp    150 shp   AMC Pacer Buick Riviera |
            |            Chev., Malibu #90 shp     90 shp             Chev. Malibu  |
            |                Chev. Monte Carlo                    Chev. Monte Carlo |
            |             Ford Mustang, 90 shp     90 shp              Ford Mustang |
            |                         AMC, #NA                               AMC NA |
            +-----------------------------------------------------------------------+

          Comment

          Working...
          X