Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Strange behavior of comparing to numeric missing / using inrange()

    Hi everyone
    After 10 years of using stata, I really thought I mastered handling comparisons with missing values at ease. I was proven wrong today.

    Can anyone explain me this behaviour? I would honestly have expected test_inrange to have been = 0, assuming the logic from test_two_if. When I then tested the condition stated in the inrange helpfile (= test_one_if), I was very surprised to see there's a difference between test_one_if and test_two_if

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str20 nacpno double date_visit long a long b long
    "test_subject" 22610 . .
    end
    format %d date_visit
    
    gen test_a = a <= date_visit
    gen test_b= date_visit <= b
    gen test_one_if = a<= date_visit<= b
    gen test_two_if = a<= date_visit & date_visit<= b
    gen test_inrange=inrange( date_visit, a, b)
    Thanks so much!

  • #2
    Originally posted by Fabian Fortner View Post
    I would honestly have expected test_inrange to have been = 0, assuming the logic from test_two_if.
    Both a and b are system missing (.), and so it follows Rule #2 of the inrange() help file: "a > . and b = . returns 1"

    I was very surprised to see there's a difference between test_one_if and test_two_if
    This comes up on the list occasionally. Unless operator precedence or parentheses direct otherwise Stata goes from left to right, and so a <= date_visit is tested first (and it's false), and then that Boolean result (zero) is tested 0 <= b, which is true.

    Comment


    • #3
      The second point just above is -- as Joseph Coveney says -- raised from time to time.

      Computer code doesn't always follow what you'd expect from mathematical conventions.

      (At some early point we learn that = in most software usually means assignment of what is on the right to what is on the left, and is not an operator in a statement of equality, and we soon take that for granted.)

      For more discussion on the broad theme of compound logical statements see e.g. https://journals.sagepub.com/doi/pdf...6867X231162009

      Comment


      • #4
        Thanks for your replies! Teaches me again to rely more on stuff that I truly understand.

        I think the rules (i.e. #2) for inrange() feel very wrong, however - or is this unreasonable?:

        - in the context of Stata, where a&b as missing are normally infinitely big, a numeric value can't be between the two - this is not very intuitive and I don't think it's consistent with other functions
        - it also feels wrong from a "content" point of view, when there's missing values for an interval and a function evaluates a numeric value to lie in a simply "non-existent" interval: this should either be missing or 0

        Comment


        • #5
          With missing values, StataCorp developers want to be consistent -- who doesn't? -- but they also want to implement what researchers are most likely to want. Thus a bare summarize always ignores missings and a bare list always includes missings. Insisting that summarize returns missing if any values are missing is programmable (but I can't recall anyone asking for it in the 32 years I have been using Stata). Insisting that a list excludes missings if you don't want to see them is an easy if condition. Sometimes missings being ignored means treating them as missing and sometimes it means treating them as zero (e.g. in a cumulative sum). And so on.

          You have a point, clearly, but so does Stata in the sense that if you always want Stata code for the algebra a <= x <= b to mean what it says even when either value a, b is missing, then you can say that, and you could say it before inrange() was ever introduced. You just need to spell it out as (a <= x) & (x <= b). Other way round inrange() was introduced to implement what researchers are presumed to want. However, there wasn't a survey!

          Comment


          • #6
            Originally posted by Fabian Fortner View Post
            Thanks for your replies! Teaches me again to rely more on stuff that I truly understand.

            I think the rules (i.e. #2) for inrange() feel very wrong, however - or is this unreasonable?:

            - in the context of Stata, where a&b as missing are normally infinitely big, a numeric value can't be between the two - this is not very intuitive and I don't think it's consistent with other functions
            - it also feels wrong from a "content" point of view, when there's missing values for an interval and a function evaluates a numeric value to lie in a simply "non-existent" interval: this should either be missing or 0
            Within the context of -inrange-, a and b are conceptually more like -Inf and +Inf, respectively, when either or both are system missing (.).

            Comment


            • #7
              Thanks for the discussion guys!

              Comment

              Working...
              X