Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract specific text

    Good morning,

    Start here with 'performancecomment'

    Use the following code to extract the text after the final comma to 'finish'
    gen finish = substr(performancecomment, strrpos(performancecomment, ",") + 1, length(performancecomment))

    Now I need to extract the text between the first and second commas from the right to 'wanted'
    Having trouble figuring the code out to achieve this.

    Any help appreciated please.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str232 performancecomment str71 finish str19 wanted
    "in touch, led over 1f out, forged clear"            " forged clear"       "led over 1f out"    
    "mid-field, ridden 2f out, one paced"                " one paced"          "ridden 2f out"      
    "mid-field, pushed along 3f out, beaten over 2f out" " beaten over 2f out" "pushed along 3f out"
    end

  • #2
    Code:
    help split
    is an alternative.

    Otherwise you need to find the position of the first comma and then get the length of the following desired substring by subtracting from the position of the last comma.

    Otherwise you could turn it into a regular expression problem.

    This works for your example with moss from SSC. Trim to taste.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str50 performancecomment str19(finish wanted)
    "in touch, led over 1f out, forged clear"            " forged clear"       "led over 1f out"    
    "mid-field, ridden 2f out, one paced"                " one paced"          "ridden 2f out"      
    "mid-field, pushed along 3f out, beaten over 2f out" " beaten over 2f out" "pushed along 3f out"
    end
    
    moss performancecomment, match(",(.*),") regex 
    Last edited by Nick Cox; 27 Sep 2023, 02:41.

    Comment


    • #3
      Code:
      gen reverse = ustrreverse(performancecomment)
      split reverse, gen(part) parse(",")
      foreach var of varlist part* {
          replace `var' = ustrreverse(`var')
      }
      part1 is the last comment (finish), part2 the part between the first and second comma from the right (wanted), part3 the part between the second and third comma from the right, etc.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thanks, Nick and Maarten.

        The split command will do the job nicely.

        Comment

        Working...
        X