Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract numbers and strings

    Hello, I am wondering if there is a way in Stata where you can extract both numbers and strings. I want to create a new variable that extracts everything that contains numbers and a "G". For example the new variable should contain 300G and 2x150G (see picture).

  • #2
    Click image for larger version

Name:	Skjermbilde 2022-10-10 164329.png
Views:	1
Size:	15.5 KB
ID:	1684965

    Here's the picture

    Comment


    • #3
      Here's your example data presented in the standard way recommended by the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str32 text
      "STORFE BIFFSTRIMLER 350G FOLKETS"
      "BEEF BURGER 2X150G FOLKETS"      
      end
      Will the letter G always appear as the final character in the string to be extracted, as it does in these two examples? And will it always have a digit immediately before it, as it does in these two examples?
      Last edited by William Lisowski; 10 Oct 2022, 09:09.

      Comment


      • #4
        Yes, the letter G will always follow the numbers with no space inbetween.

        Comment


        • #5
          Perhaps this?

          Code:
          clear
          input str32 text
          "STORFE BIFFSTRIMLER 350G FOLKETS"
          "BEEF BURGER 2X150G FOLKETS"      
          end
          
          gen wanted = ustrregexs(0) if ustrregexm(text,"(\b[a-zA-Z0-9]+G\b)")
          which produces:
          Code:
          . list , noobs
          
            +-------------------------------------------+
            |                             text   wanted |
            |-------------------------------------------|
            | STORFE BIFFSTRIMLER 350G FOLKETS     350G |
            |       BEEF BURGER 2X150G FOLKETS   2X150G |
            +-------------------------------------------+
          This assumes there will be only one such word in each string. Is that okay, or are there possibly multiple such words in each?

          Comment


          • #6
            An enhacement to post #5 based on the answers given in post #4.
            Code:
            clear
            input str32 text
            "STORFE BIFFSTRIMLER 350G FOLKETS"
            "BEEF BURGER 2X150G FOLKETS"      
            "BEEF STEAK BIG CHOP"
            end
            
            gen wanted1 = ustrregexs(0) if ustrregexm(text,"(\b[a-zA-Z0-9]+G\b)")
            gen wanted2 = ustrregexs(0) if ustrregexm(text,"(\b[a-zA-Z0-9]+[0-9]G\b)")
            list, clean
            Code:
            . list, clean
            
                                               text   wanted1   wanted2  
              1.   STORFE BIFFSTRIMLER 350G FOLKETS      350G      350G  
              2.         BEEF BURGER 2X150G FOLKETS    2X150G    2X150G  
              3.                BEEF STEAK BIG CHOP       BIG

            Comment


            • #7
              William Lisowski Thanks for fixing that issue!

              But in the spirit of further refinement:

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input str32 text
              "STORFE BIFFSTRIMLER 350G FOLKETS"
              "BEEF BURGER 2X150G FOLKETS"      
              "BEEF STEAK BIG CHOP"
              "BEEF BURGER 2X150XG CHOP"
              end
              
              gen wanted1 = ustrregexs(0) if ustrregexm(text,"(\b[a-zA-Z0-9]+G\b)")
              gen wanted2 = ustrregexs(0) if ustrregexm(text,"(\b[a-zA-Z0-9]+[0-9]G\b)")
              gen wanted3 = ustrregexs(0) if ustrregexm(text,"(\b[0-9a-zA-Z]*[0-9]+[0-9a-zA-Z]*G\b)")
              which produces:

              Code:
              . list, noobs
              
                +----------------------------------------------------------------+
                |                             text   wanted1   wanted2   wanted3 |
                |----------------------------------------------------------------|
                | STORFE BIFFSTRIMLER 350G FOLKETS      350G      350G      350G |
                |       BEEF BURGER 2X150G FOLKETS    2X150G    2X150G    2X150G |
                |              BEEF STEAK BIG CHOP       BIG                     |
                |         BEEF BURGER 2X150XG CHOP   2X150XG             2X150XG |
                +----------------------------------------------------------------+
              Edit: ah, but perhaps this case is ruled out by the description in #4. If so, never mind. wanted2 is what you want!
              Last edited by Hemanshu Kumar; 10 Oct 2022, 09:59.

              Comment


              • #8
                Thank you both! Helped us a lot. In our case #6 was the perfect solution

                Comment

                Working...
                X