Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting first five and last three parts from string variable.

    Dear Stata users,

    I hope you are staying safe.
    I have data set with code and name like this.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 code str36 name
    "160610117117" "A"                  
    "160610118118" "B"           
    "160615001003" "C"                 
    "160615001015" "D"                       
    "160615001026" "E"                 
    "160615001031" "F"                        
    "160615001094" "G"                    
    "160615001103" "H"                       
    "160615001114" "I"                  
    end
    Hi, I'd like to change these codes as follows:
    160610117117 -> 16061117

    So, I want to reformulate the codes with first five and last three parts.

    I tried some codes with substr() but could not figure out.. By the way, the codes are string variable! If you think changing to numeric is better let me konw!

  • #2
    Code:
    gen wanted1 = substr(code, 1, 5) + substr(code, -3, 3)
    gen wanted2 = subinstr(code, substr(code, 6, 4), "", 1)
    
    l wanted?
    We'd need to see your code to tell you what you did wrong.

    Comment


    • #3
      Hi Nick,

      thank you for the reply.
      It is what I was supposed to do.. I did not know
      substr(code, -3, 3) is also available like backwards counting. Thank you!

      Comment


      • #4
        Yes Ed, backward counting is available in many contexts, including with the -in-qualifier (see toward the end of this example). But this is needed only if the string in the middle has varying length, which does not seem to be the case in your data. For your data forward counting would have done just fine. Nick has already illustrated concatenation with the + operator (which in this context results in a one-liner) so I will do it in the more round about way, which in turn is more handy if you want to operate on general varlists:

        Code:
        . gen first = substr(code,1,5)
        
        . gen last = substr(code,10,13)
        
        . egen shortcode = concat(first last)
        
        . list in -4/l
        
             +-----------------------------------------------+
             |         code   name   first   last   shortc~e |
             |-----------------------------------------------|
          6. | 160615001031      F   16061    031   16061031 |
          7. | 160615001094      G   16061    094   16061094 |
          8. | 160615001103      H   16061    103   16061103 |
          9. | 160615001114      I   16061    114   16061114 |
             +-----------------------------------------------+

        Comment


        • #5
          Examples or sources of extra information:


          Code:
          help functions 
          help string functions
          help substr()

          https://www.stata-journal.com/articl...article=dm0058

          Comment

          Working...
          X