Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • unexpected results with inrange

    I'm perplexed by some unexpected results I'm getting with the inrange function. The data below simulates the basic issue. My understanding is that Stata always interprets missing values as infinitely large. If that's true, why do row 3 and row 4 below evaluate to "Y" in the gen between... command?

    Code:
    input a z b
    1 50 100
    1 50 .
    . 50 100
    . 50 .
    end
    
    gen between="Y" if inrange(z, a, b)
    I've reviewed Nick Cox's relevant Stata Tip 39 but I still don't understand why I'm getting the results produced above for rows 3 and 4. It seems like those rows should not evaluate to "Y".




  • #2
    the following is from the help file:
    Code:
    a > . and b = . returns 1.
    added in edit: that is supposed to be ">=" after the "a" - but it didn't copy correctly for reasons unknown to me

    the other rules are given in
    Code:
    help inrange()
    Last edited by Rich Goldstein; 18 May 2023, 08:45.

    Comment


    • #3
      I wish the help file provided a little explanation regarding those rules. Some kind of justification would be useful in helping me understand why missing values seem to be, for purposes of the inrange command, given a different interpretation than in other commands. The results in rows 3 and 4 seem totally counterintuitive (to me anyway) given the typical handling of missing values in other commands.

      Comment


      • #4
        The Tip I wrote didn’t discuss all the rules, IIRC.

        Although it may sound facetious I think the Stata philosophy is to give as far as possible what a thoughtful user should typically want, but who is say what that is? Missings are ignored by summarize and many statistical ommands; treated as zero by sum() and egen functions that add; treated as missing when you multiply; treated as true by logical operators; treated as positive infinity by > or >=.

        I have twice heard papers on why Stata should adopt a three-way logic, which both ended in heated discussion and session chairs having to move matters on because no-one could agree. As far as I recall those speaking split into those who favour present rules if only because changing them cannot be contemplated: those who think there is a case for a three-way logic, but not that just proposed, which is both arbitrary and complicated; and the speaker….

        Comment


        • #5
          Thanks Nick, for that context. Maybe my expectation is atypical. I think I'll just need to try to remember that missings are handled this way by inrange.

          Comment


          • #6
            The Tip cited in #1 was written in 2006 -- and says more about behaviour with missing values than I remembered. I don't feel too defensive about it because it did flag that you needed to watch out for what happens with missing values. But since 2006 I have used inrange() a great deal and have had cause to be grateful that inrange(x, 42, .) say does not include missing but corresponds in more mathematical notation to [42, infinity)

            This is the opposite of something that often surprises and often bites, namely that

            Code:
            ... if x > 42
            does not ignore missing values on x

            Comment

            Working...
            X