Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • String to numeric value

    I have a string variable that contains a number of days: "3 days", "76 days" and so on.
    I want to extract the number part of this and put it into a new numeric variable.

    I have started with removing the space:

    Code:
    replace duration = subinstr(duration, " ", "", .)
    so it now reads: "3days", "76days"

    I have then tried to extract it with forvalues but cannot get it to work. Any suggestions?

  • #2
    if "days" are the only characters you need to worry about, then use -destring- with the ignore option; here is an example but, since you did not supply a data example, I can't be sure it will cover all your issues:
    Code:
    destring duration, gen(duration2) ignore("days")
    please see
    Code:
    h destring

    Comment


    • #3
      For your data example the solution you tried makes things worse. Try

      Code:
      gen wanted = real(word(duration, 1))
      or

      Code:
      destring duration, ignore(“days”) gen(wanted)

      Comment


      • #4
        Hi Morten,

        You can try:

        Code:
        egen duration2 = sieve(duration), keep(numeric)
        This will remove any non-numeric characters from duration. But be aware, if you have an observation that is "3 days, 5 hours", the end result would be "35".

        Comment


        • #5
          I would probably do something like

          Code:
          generate duration_numeric = real(subinstr(duration, "days", "", 1))
          which will create missing values for observations with a pattern other than "# days".

          edit:

          even more restrictive:

          Code:
          generate duration_numeric = real(regexs(1)) if regexm(duration, "^([0-9]+) days$")
          Last edited by daniel klein; 03 Jun 2022, 09:50.

          Comment


          • #6
            #4 requires that you have installed a community-contributed package, namely

            Code:
            ssc install egenmore

            Comment


            • #7
              Thank you all for your suggestions!

              They do however not solve my problem because I have been too unclear - sorry for the waste of your time.

              There are other possibilities in the string such as: "day", "days", "week", "weeks", "month", "months", "don't know" and so on.
              I have not been able to figure this out myself.

              Comment


              • #8
                Extracting digits is not too difficult, but if there are irrelevant details, then you need to spell out some criteria on what is relevant. Assuming that the description immediately follows the digits, you can write out a regular expression along the following lines:

                Code:
                clear
                input str29 duration
                "12 days"
                "20000 steps in 100 MONTHS"
                "20 kilometers"
                "don't know"
                "22 Weeks"
                "33Months"
                end
                
                gen which=regexs(0)+"s" if regexm(lower(duration), "(day|week|month)")
                gen wanted= real(ustrregexs(1)) if ustrregexm(lower(duration),"(\d+)(\s*)(day|week|month)"), before(which)
                Res.:

                Code:
                
                . l, sep(0)
                
                     +---------------------------------------------+
                     |                  duration   wanted    which |
                     |---------------------------------------------|
                  1. |                   12 days       12     days |
                  2. | 20000 steps in 100 MONTHS      100   months |
                  3. |             20 kilometers        .          |
                  4. |                don't know        .          |
                  5. |                  22 Weeks       22    weeks |
                  6. |                  33Months       33   months |
                     +---------------------------------------------+
                Last edited by Andrew Musau; 03 Jun 2022, 12:26.

                Comment


                • #9
                  alternatively, you can add more characters to the "ignore" option and you can find them, or the ones remaining after a first trial, via something like:
                  Code:
                  ta duration if real(duration)==.
                  note that #2 above explained about the problems for us caused by you not supplying a data example

                  Comment

                  Working...
                  X