Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • extracting words within a string variable

    All, I have been struggling to identify the precise usage of regexs and regexm to generate a new variable the contains a single word within a string variable. More specifically, could anybody offer advice on implementing these commands to extract "February" from the following: 24 February 2023.

    Many thanks for your time and consideration.

  • #2
    Do you have date strings and you want to extract the month? There are several solutions to extract substrings, but context is key.

    Comment


    • #3
      Thanks for your reply, Andrew. The string variable with which I am working is "day month year" (e.g., 1 January 2023, 18 March 2023, and so on). I would like to extract the month (i.e., middle word) for a newly generated string variable (let's call it month). I hope this is sufficient context--if not, let me know what I have left out and I will provide additional information.

      Comment


      • #4
        Using regular expressions and based on the pattern, you can match anything that is not a number and extract that. So below, delete anything that is a number.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str14 date
        "11 March 2020"
        "29 August 2022"
        end
        
        gen wanted= ustrregexra(date, "\d", "")
        Res.:

        Code:
        . l
        
             +---------------------------+
             |           date     wanted |
             |---------------------------|
          1. |  11 March 2020     March  |
          2. | 29 August 2022    August  |
             +---------------------------+

        But you may convert the string date to Stata SIF values and use the -string()- function:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str14 date
        "11 March 2020"
        "29 August 2022"
        end
        
        gen Date= daily(date, "DMY")
        format Date %td
        gen month= string(Date, "%tdM")
        Res.:

        Code:
        . l
        
             +-------------------------------------+
             |           date        Date    month |
             |-------------------------------------|
          1. |  11 March 2020   11mar2020    March |
          2. | 29 August 2022   29aug2022   August |
             +-------------------------------------+
        Last edited by Andrew Musau; 31 Mar 2023, 09:23.

        Comment


        • #5
          That's terrific. I really appreciate the help. Have a great day.

          Comment


          • #6
            In this case you could read in as a daily date and then extract the month of the year, that is using

            Code:
            gen wanted = month(daily(date, “DMY”))
            which presumes that an answer in 1 … 12 is as helpful as a month name.

            Alternatively, if month is always surrounded by spaces then the month is word(date, 2) — which returns the string.

            Comment

            Working...
            X