Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Breaking Up Strings with Regexm

    It doesn't really matter, but I'm a little obsessive about little details like this. I wanna have the names of these brands have spaces in them. Such that "BankofAmerica" "BankofNewYork" are Bank of America, Bank of New York, Union Bank, and so on.

    How might I do this? Presuambly, regexm would be involved in some way? I guess the idea would be to to breakup the string at the occurrence of a new capital letter, if the previous letter wasn't a capital letter?
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str46 yougovname
    "BankofAmerica"      
    "BankofNewYork"      
    "Barclay's"          
    "BB&T"                
    "Chase"              
    "Citibank"            
    "Comerica"            
    "Fifth-Third"        
    "HSBC"                
    "HuntingtonBank"      
    "KeyBank"            
    "M&TBank"            
    "BMOHarrisBank"      
    "PNCBank"            
    "RegionsBank"        
    "SunTrust"            
    "UnionBank"          
    "USBank"              
    "WellsFargo"          
    "ZionsBank"          
    "BlackAngus"          
    "Bonefish"            
    "BucadiBeppo"        
    "ChartHouse"          
    "Fleming's"          
    "J.Alexander's"      
    "KonaGrill"          
    "LoneStarSteakhouse"  
    "LongHornSteakhouse"  
    "Maggiano's"          
    "McCormick&Schmick's"
    "Morton's"            
    "OutbackSteakhouse"  
    "P.F.Chang's"        
    "RainforestCafe"      
    "Ruth'sChris"        
    "SaltgrassSteakhouse"
    "Smith&Wollensky"    
    "SmokeyBonesBBQ&Grill"
    "TexasRoadhouse"      
    "ThePalm"            
    "TonyRoma's"          
    "Arby's"              
    "BajaFresh"          
    "BurgerKing"          
    "Carl'sJr"            
    "Chipotle"            
    "Church's"            
    "Hardee's"            
    "In-N-Out"            
    "JackintheBox"        
    "KFC"                
    "Krystal"            
    "LongJohnSilvers"    
    "McDonald's"          
    "Nathan'sFamous"      
    "Popeyes"            
    "Quiznos"            
    "Rubio's"            
    "Schlotzsky's"        
    "Subway"              
    "TacoBell"            
    "Wendy's"            
    "Whataburger"        
    "WhiteCastle"        
    "Wienerschnitzel"    
    "Advair"              
    "Ambien"              
    "Avandia"            
    "Avodart"            
    "Boniva"              
    "Cialis"              
    "Coreg"              
    "Coricidin"          
    "Crestor"            
    "Imitrex"            
    "Levitra"            
    "Lipitor"            
    "Nexium"              
    "Plavix"              
    "Relpax"              
    "Requip"              
    "Singulair"          
    "Valtrex"            
    "Viagra"              
    "Wellbutrin"          
    "Zantac"              
    "Zocor"              
    "Zoloft"              
    "Zyrtec"              
    "AAMCO"              
    "AdvanceAutoParts"    
    "Arco"                
    "AutoZone"            
    "BP"                  
    "Bridgestone"        
    "Carquest"            
    "Chevron"            
    "Citgo"              
    "ConocoPhillips"      
    end
    So far, I've tried
    Code:
    foreach L in `c(ALPHA)' {
        
    replace you= subinstr(you, "`L'", " `L'", .)
    }
    which gets us part of the way there.
    Last edited by Jared Greathouse; 22 Jun 2022, 07:34.

  • #2
    There are various patterns here which you can identify. First, insert a space if you have a lowercase letter followed by an uppercase letter (e.g., UnionBank = Union Bank):

    Code:
    replace yougovname = ustrregexra(yougovname, "([a-z])([A-Z])", "$1 $2")
    Second, BMOHarrisBank should be BMO Harris Bank implying two consecutive uppercase followed by lower case, add a space:


    Code:
    replace yougovname = ustrregexra(yougovname, "([A-Z])([A-Z])([a-z])", "$1 $2$3")
    Spaces delimiting "of" - RISKY - , e.g., if you have the words "offshore" or "goofy", this can result in unintended results:

    Code:
    replace yougovname = ustrregexra(yougovname, "([A-Za-z\s])(of)([A-Za-z\s])", "$1 $2 $3")
    and so on.

    At the end, clean out extra spaces with:

    Code:
    replace yougovname =trim(itrim(yougovname))
    Res.:

    Code:
    . l, sep(0)
    
         +------------------------+
         |             yougovname |
         |------------------------|
      1. |        Bank of America |
      2. |       Bank of New York |
      3. |              Barclay's |
      4. |                   BB&T |
      5. |                  Chase |
      6. |               Citibank |
      7. |               Comerica |
      8. |            Fifth-Third |
      9. |                   HSBC |
     10. |        Huntington Bank |
     11. |               Key Bank |
     12. |               M&T Bank |
     13. |        BMO Harris Bank |
     14. |               PNC Bank |
     15. |           Regions Bank |
     16. |              Sun Trust |
     17. |             Union Bank |
     18. |                US Bank |
     19. |            Wells Fargo |
     20. |             Zions Bank |
     21. |            Black Angus |
     22. |               Bonefish |
     23. |           Bucadi Beppo |
     24. |            Chart House |
     25. |              Fleming's |
     26. |          J.Alexander's |
     27. |             Kona Grill |
     28. |   Lone Star Steakhouse |
     29. |   Long Horn Steakhouse |
     30. |             Maggiano's |
     31. |   Mc Cormick&Schmick's |
     32. |               Morton's |
     33. |     Outback Steakhouse |
     34. |            P.F.Chang's |
     35. |        Rainforest Cafe |
     36. |           Ruth's Chris |
     37. |   Saltgrass Steakhouse |
     38. |        Smith&Wollensky |
     39. | Smokey Bones BBQ&Grill |
     40. |        Texas Roadhouse |
     41. |               The Palm |
     42. |            Tony Roma's |
     43. |                 Arby's |
     44. |             Baja Fresh |
     45. |            Burger King |
     46. |              Carl's Jr |
     47. |               Chipotle |
     48. |               Church's |
     49. |               Hardee's |
     50. |               In-N-Out |
     51. |          Jackinthe Box |
     52. |                    KFC |
     53. |                Krystal |
     54. |      Long John Silvers |
     55. |            Mc Donald's |
     56. |        Nathan's Famous |
     57. |                Popeyes |
     58. |                Quiznos |
     59. |                Rubio's |
     60. |           Schlotzsky's |
     61. |                 Subway |
     62. |              Taco Bell |
     63. |                Wendy's |
     64. |            Whataburger |
     65. |           White Castle |
     66. |        Wienerschnitzel |
     67. |                 Advair |
     68. |                 Ambien |
     69. |                Avandia |
     70. |                Avodart |
     71. |                 Boniva |
     72. |                 Cialis |
     73. |                  Coreg |
     74. |              Coricidin |
     75. |                Crestor |
     76. |                Imitrex |
     77. |                Levitra |
     78. |                Lipitor |
     79. |                 Nexium |
     80. |                 Plavix |
     81. |                 Relpax |
     82. |                 Requip |
     83. |              Singulair |
     84. |                Valtrex |
     85. |                 Viagra |
     86. |             Wellbutrin |
     87. |                 Zantac |
     88. |                  Zocor |
     89. |               Zol of t |
     90. |                 Zyrtec |
     91. |                  AAMCO |
     92. |     Advance Auto Parts |
     93. |                   Arco |
     94. |              Auto Zone |
     95. |                     BP |
     96. |            Bridgestone |
     97. |               Carquest |
     98. |                Chevron |
     99. |                  Citgo |
    100. |        Conoco Phillips |
         +------------------------+
    
    .
    Last edited by Andrew Musau; 22 Jun 2022, 09:12.

    Comment


    • #3
      Below is a solution following #1's original idea. Moreover, there are other patterns, like no space between "Mc" and "Donald's", space in "Smith & Wollensky", etc. Of course, #2's solution is much efficient.

      Code:
      foreach l in `c(alpha)' {
          foreach L in `c(ALPHA)' {
              replace yougovname = subinstr(yougovname, "`l'`L'", "`l' `L'", .)
              replace yougovname = subinstr(yougovname, "Mc `L'", "Mc`L'", .)
              replace yougovname = subinstr(yougovname, "`l'&", "`l' & ", .)
          }
      }
              replace yougovname = subinstr(yougovname, "of ", " of ", .)
              
      foreach l in `c(alpha)' {
          foreach L1 in `c(ALPHA)' {
              foreach L2 in `c(ALPHA)' {
                  replace yougovname = subinstr(yougovname, "`L1'`L2'`l'", "`L1' `L2'`l'", .)
                  replace yougovname = subinstr(yougovname, "`L1'&`L2'`l'", "`L1' & `L2'`l'", .)
              }
          }
      }
      
      replace yougovname = subinstr(yougovname, ".", ". ", .)
      Code:
                           yougovname  
        1.            Bank of America  
        2.           Bank of New York  
        3.                  Barclay's  
        4.                       BB&T  
        5.                      Chase  
        6.                   Citibank  
        7.                   Comerica  
        8.                Fifth-Third  
        9.                       HSBC  
       10.            Huntington Bank  
       11.                   Key Bank  
       12.                   M&T Bank  
       13.            BMO Harris Bank  
       14.                   PNC Bank  
       15.               Regions Bank  
       16.                  Sun Trust  
       17.                 Union Bank  
       18.                    US Bank  
       19.                Wells Fargo  
       20.                 Zions Bank  
       21.                Black Angus  
       22.                   Bonefish  
       23.               Bucadi Beppo  
       24.                Chart House  
       25.                  Fleming's  
       26.             J. Alexander's  
       27.                 Kona Grill  
       28.       Lone Star Steakhouse  
       29.       Long Horn Steakhouse  
       30.                 Maggiano's  
       31.      McCormick & Schmick's  
       32.                   Morton's  
       33.         Outback Steakhouse  
       34.              P. F. Chang's  
       35.            Rainforest Cafe  
       36.               Ruth's Chris  
       37.       Saltgrass Steakhouse  
       38.          Smith & Wollensky  
       39.   Smokey Bones BBQ & Grill  
       40.            Texas Roadhouse  
       41.                   The Palm  
       42.                Tony Roma's  
       43.                     Arby's  
       44.                 Baja Fresh  
       45.                Burger King  
       46.                  Carl's Jr  
       47.                   Chipotle  
       48.                   Church's  
       49.                   Hardee's  
       50.                   In-N-Out  
       51.              Jackinthe Box  
       52.                        KFC  
       53.                    Krystal  
       54.          Long John Silvers  
       55.                 McDonald's  
       56.            Nathan's Famous  
       57.                    Popeyes  
       58.                    Quiznos  
       59.                    Rubio's  
       60.               Schlotzsky's  
       61.                     Subway  
       62.                  Taco Bell  
       63.                    Wendy's  
       64.                Whataburger  
       65.               White Castle  
       66.            Wienerschnitzel  
       67.                     Advair  
       68.                     Ambien  
       69.                    Avandia  
       70.                    Avodart  
       71.                     Boniva  
       72.                     Cialis  
       73.                      Coreg  
       74.                  Coricidin  
       75.                    Crestor  
       76.                    Imitrex  
       77.                    Levitra  
       78.                    Lipitor  
       79.                     Nexium  
       80.                     Plavix  
       81.                     Relpax  
       82.                     Requip  
       83.                  Singulair  
       84.                    Valtrex  
       85.                     Viagra  
       86.                 Wellbutrin  
       87.                     Zantac  
       88.                      Zocor  
       89.                     Zoloft  
       90.                     Zyrtec  
       91.                      AAMCO  
       92.         Advance Auto Parts  
       93.                       Arco  
       94.                  Auto Zone  
       95.                         BP  
       96.                Bridgestone  
       97.                   Carquest  
       98.                    Chevron  
       99.                      Citgo  
      100.            Conoco Phillips

      Comment

      Working...
      X