Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why are missing values treated as positive infinity in Stata?

    Up until now, I have never found any practical use for this. In fact, it generally results in confusion. At the moment, I have two variables, x and y. Both x and y can be missing. However, I only wish to replace x with y when x>y. So, I type:


    Code:
    replace x=y if x>y
    I realized that this is probably not sufficient. Perhaps I should type:

    Code:
    replace x=y if x>y & !missing(x)
    Surprisingly, both resulted in the same number of values being replaced (coincidentally?- not sure).

    Is a true missing never not treated as positive infinity? If so, when? And if not, can we make an option that treats it as truly missing (i.e. does not even appear when conducting operations (at least relational)).


    Many thanks,
    CS

  • #2
    Chinmay:
    you can take advantage of the extended missing values:
    Code:
    . set obs 10
    number of observations (_N) was 0, now 10
    
    . g A=runiform() in 2/7
    
    . g B=runiform() in 1/5
    
    . replace A=.a if A==.
    
    . replace B=.b if B==.
    
    . replace A=B if A>B
    
    
    . list
    
         +---------------------+
         |        A          B |
         |---------------------|
      1. | .4961259   .4961259 |
      2. | .0674011   .7167162 |
      3. | .3379889    .859742 |
      4. | .1340756   .1340756 |
      5. | .4884419   .4884419 |
         |---------------------|
      6. | .0454151         .b |
      7. | .7459667         .b |
      8. |       .a         .b |
      9. |       .a         .b |
     10. |       .a         .b |
         +---------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      I read the official explanation on why missing values are treated as positive infinity and remember reading that "a missing value must evaluate to something" to make logical comparisons. That tells me that it is not possible to treat as truly missing.
      Can you however please try your experiment with
      replace x=y if x>y & x !=. to see if you get the same number of replacements ?

      Comment


      • #4
        No data example in #1 to discuss, but there is a general question that can be discussed.

        System missing is treated as larger than any non-missing value and extended missing values are greater in turn.

        I find the easiest way into understanding this is to wonder what happens when you sort a variable. Where do the missing values go? They won't just fly up into the middle air; they have to go somewhere. Two choices make sense, missing values being treated as arbitrarily small (meaning precisely, negative but arbitrarily large in absolute value) and being treated as arbitrarily large. Stata's developers chose the second.

        There are side-effects of this, most obviously missing values being included in a comparison when you may not want them to be.

        Every now and again, someone proposes a three-way logic for Stata, based usually on true, false and missing (don't know). Then you need as a minimum all the two-way logic tables (e.g. value operator value).. Then in practice, people split three ways

        1. Just keep the present system as the devil we know (most opinions)

        2. Good idea (rare)

        3. We need a three-way logic, it's just that this one is crazy and here's mine instead (more common than #2).

        Comment


        • #5
          Hi All. Thanks a lot for the valuable comments. Nick Cox = yes, indeed that makes sense. The sorting does indeed clear things up. However, they need not be mutually exclusive it seems to me. We could still treat them as arbitrarily large while sorting, but have an option, akin to the
          Code:
          , missing
          option when calculating the rowsum with egen. This would be the inverse operation, i.e. if one of the values for value operator value is missing, one is returned a missing...

          Comment


          • #6
            On your last point in #1, on ignoring missing in Stata, there is only one universal behaviour with logical operations: 0 is treated as false and non-zero is treated as true (including all missing values).

            In your original example, since have missing values, it is most straight-forward to explicitly handle those situations in your code. For instance, you seem to ignore the case of when -y- is missing. One succinct way to handle this could be

            Code:
            replace x=y if x>y & !missing(x, y)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Chinmay:
              you can take advantage of the extended missing values:
              Code:
              . set obs 10
              number of observations (_N) was 0, now 10
              
              . g A=runiform() in 2/7
              
              . g B=runiform() in 1/5
              
              . replace A=.a if A==.
              
              . replace B=.b if B==.
              
              . replace A=B if A>B
              
              
              . list
              
              +---------------------+
              | A B |
              |---------------------|
              1. | .4961259 .4961259 |
              2. | .0674011 .7167162 |
              3. | .3379889 .859742 |
              4. | .1340756 .1340756 |
              5. | .4884419 .4884419 |
              |---------------------|
              6. | .0454151 .b |
              7. | .7459667 .b |
              8. | .a .b |
              9. | .a .b |
              10. | .a .b |
              +---------------------+
              
              .
              Thank you Carlo Lazzaro !

              Comment

              Working...
              X