Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • first letters of each element

    Dear All, I have this data set:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str32 ipc str20 wanted
    "H02K21/12 H02K21/24"              "H02K H02K"          
    "A61K35/78 A61P39/02"              "A61K A61P"          
    "H02J15/00"                        "H02J"               
    "C08L23/06 C08K3/34 C08J5/18 C09K" "C08L C08K C08J C09K"
    end
    For each element of the `ipc' variable, I wish to take the first 4 letters to construct `wanted' variable. Any suggestions? Thanks.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    gen wanted2 = strtrim(ustrregexra(ipc,"(.{4})(.*?)( |$)","$1 "))

    Comment


    • #3
      Dear Ali, Thanks a lot. It works fine.

      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        Ali's method is a slick bit of regex. An alternative, in case you wish to do some downstream processing with each 4-digit code could use split.

        Code:
        split ipc, gen(ipcs)
        foreach v of varlist ipcs* {
          qui replace `v' = substr(`v', 1, 4)
        }
        qui compress ipcs*

        Comment

        Working...
        X