Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help for replacing parts of a string variable

    Dear community,

    I want to clean a string variable that contains the name of a bank. Sometimes, it also includes some numeric content (a percentage indication) as the example from dataex shows below:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str79 Lender str20 LenderCountry_clean
    "ABN AMRO BIBF"                 ""          
    "ABN AMRO BIBF"                 ""          
    "ABN AMRO BIBF"                 ""          
    "ABN AMRO BIBF"                 ""          
    "ABN AMRO BIBF"                 "Thailand"  
    "ABN AMRO BIBF 10.00%"          ""          
    "ABN AMRO BIBF 5.40%"           ""          
    "ABN AMRO BIBF 6.00%"           ""          
    "ABN AMRO BIBF 6.13%"           ""          
    "ABN AMRO BIBF 6.29%"           ""          
    "ABN AMRO BIBF 7.42%"           ""          
    "ABN AMRO Bank Bhd"             ""          
    "ABN AMRO Bank Bhd"             ""          
    "ABN AMRO Bank Bhd"             ""          
    "ABN AMRO Bank Bhd"             ""          
    "ABN AMRO Bank Bhd"             ""          
    "ABN AMRO Bank Bhd"             ""          
    "ABN AMRO Bank Bhd"             ""          
    "ABN AMRO Bank Bhd 1.50%"       ""          
    "ABN AMRO Bank Bhd 1.50%"       ""          
    "ABN AMRO Bank NV"              ""          
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        ""            
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        ""          
    "ABN AMRO Bank NV [RBS]"        "Netherlands"
    "ABN AMRO Bank NV [RBS]"        "Netherlands"
    "ABN AMRO Bank NV [RBS]"        "Netherlands"
    "ABN AMRO Bank NV [RBS] 1.10%"  ""          
    "ABN AMRO Bank NV [RBS] 20.00%" ""          
    "ABN AMRO Bank NV [RBS] 32.62%" ""          
    "ABN AMRO Bank NV [RBS] 5.00%"  ""          
    "ABN AMRO Bank NV [RBS] 6.67%"  ""          
    "ABN AMRO Inc"                  ""          
    "ABN AMRO Inc"                  ""          
    "ANZ Grindlays Bank Plc"        ""          
    end

    My objective is to complete the observations of the variable LenderCountry_clean if the bank name is the same. Now, what I did so far was to use the following command to fill up the observations:

    Code:
    bysort Lender (LenderCountry_clean) : replace LenderCountry_clean = LenderCountry_clean[_N] if missing(LenderCountry_clean) & _n < _N
    However, the problem are now the observations that contain some values (percentages), i.e. 1.10%, 20.00%. In this case, the bank name is not exactly the same and I cannot fill up the observations as I would like to. For example, for the lender "ABN AMRO Bank NV [RBS] 1.10%" I would like to have the information "Netherlands" for the variable LenderCountry_clean.

    Hence, my question is if there is a way to "clean" the variable Lender to get rid of the percentages. Note that these percentage indications, if there are any, are always at the end of the string. Is there a way to clean the variable Lender such that I can fill up the observations with the information from LenderCountry_clean ?

    I greatly appreciate any help or comments and thank you in advance.

    Best wishes,
    Elio


  • #2
    There is little doubt in my mind that this can be done as a one-liner by using regular expressions. But regular expressions syntax has always eluded me. So here's a four-line solution using "ordinary" Stata commands.

    Code:
    gen to_erase = word(reverse(Lender), 1)
    replace to_erase = "" if substr(to_erase, 1, 1) != "%"
    replace Lender = subinstr(Lender, reverse(to_erase), "", .) if !missing(to_erase)
    replace Lender = trim(Lender)

    Comment


    • #3
      Thank you very much, this works beautifully! 4 lines or one line, what works is best. Thank you.

      Comment

      Working...
      X