Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using range for string data

    Hi,

    I am trying to generate a new variable with string data. My string data is numbers that code for different variable. I want to generate a new variable for a range from 93451-93461. Can I do this using range? In general, would it be better if I treated numbers that code for something as numerical and not string?

    Thanks

  • #2
    you can use -inrange()- for string variables; see
    Code:
    h inrange()
    I don't think that the -range- command will do what you want but your question is not entirely clear and there is no data example, so the above is my guess

    added: whether you should leave this variable as a string or make it numeric depends on its role in your data; e.g., if it is an id, then it can, and in most cases, should be left as a string
    Last edited by Rich Goldstein; 06 Sep 2022, 06:56.

    Comment


    • #3
      Thank you for the response.

      I am trying to do this:

      generate cath=(cpt=="93451"|cpt=="93451") ///etc ect. I am trying to do this for string data from 93451-93461

      Comment


      • #4
        your example appears to have a typo but I think the following is what you want:
        Code:
        gen byte cath=inrange(cpt,"93541","93461")

        Comment


        • #5
          Thank you, that worked. What typo do you mean?

          Comment


          • #6
            if you look at your example in #3, you will see that "93451" is repeated twice (and also that I make a typo in my example code in #4 that was different (93541 when I meant 93451))

            Comment


            • #7
              got it, thanks

              Comment


              • #8
                The real -function()- may help here

                Code:
                gen byte cath=inrange(real(cpt),93541,93461)
                or a regular expression of the form:

                Code:
                gen wanted= regexm(cpt, "[934]5[1-9]|[934]6[0-1]")


                Comment


                • #9
                  Consider the following three examples, where the first is taken from post #4.
                  Code:
                  . display inrange("93499","93541","93461")
                  0
                  
                  . display inrange("93499","93461","93541")
                  1
                  
                  . display inrange("934999","93461","93541")
                  1
                  
                  .
                  From the first example, we see that inrange() doesn't do what is intended when the third argument is less than (for strings, earlier in the sort sequence than) the second argument.

                  The second example shows that inrange() does what is expected with the second and third arguments are in sort sequence order.

                  The third example shows that if you were to have any 6-digit numbers between 934610 and 935419 they would also be within the given range, when treated as strings.

                  I doubt that happens in this case, but for those who read this post later it's important to note and understand the difference string comparisons and numeric comparisons.

                  Added in edit: crossed with post #8. For numbers stored as strings, I'd recommend using the real() function to avoid the problem noted in the third example here.
                  Code:
                  . display inrange(real("934999"),93461,93541)
                  0
                  
                  .
                  Last edited by William Lisowski; 06 Sep 2022, 08:02.

                  Comment

                  Working...
                  X