Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Triple conditions such as relimp==relat==1 do not do what I expect them to do. Why? Where is this behaviour explained?

    Working on a problem posed in this thread
    https://www.statalist.org/forums/for...other-variable
    I discovered to my shock that triple conditions such as relimp==relat==1 and relimp==relat==3 do not do what I expect them to do.

    In my mind, (relimp==relat==1) should be equivalent to (relimp==relat & relat==1). But it is not so, as the example below demonstrates.

    To have some data to work with

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id byte wave float(relimp relat)
    110 1 1 1
    116 1 1 1
    116 2 1 1
    116 3 1 1
    123 1 3 3
    123 2 3 3
    123 3 3 2
    123 4 3 3
    123 5 3 2
    126 2 3 3
    126 3 3 3
    126 4 3 3
    138 3 3 2
    end
    I want to generate two dummies, dummy1 equal to one when relimp==relat==1, and dummy3 equal to one when relimp==relat==3. And the triple condition fails me in both cases:

    Code:
    . gen dummy1 = relimp==relat==1
    
    . gen dummy11 = relimp==relat & relat==1
    
    . gen dummy3 = relimp==relat==3
    
    . gen dummy33 = relimp==relat & relat==3
    
    . compare dummy1 dummy11
    
                                            ---------- difference ----------
                                count       minimum      average     maximum
    ------------------------------------------------------------------------
    dummy1=dummy11                  7
    dummy1>dummy11                  6             1            1           1
                           ----------
    jointly defined                13             0     .4615385           1
                           ----------
    total                          13
    
    . compare dummy3 dummy33
    
                                            ---------- difference ----------
                                count       minimum      average     maximum
    ------------------------------------------------------------------------
    dummy3<dummy33                  6            -1           -1          -1
    dummy3=dummy33                  7
                           ----------
    jointly defined                13            -1    -.4615385           0
                           ----------
    total                          13
    
    . list, sep(0)
    
         +-------------------------------------------------------------------+
         |  id   wave   relimp   relat   dummy1   dummy11   dummy3   dummy33 |
         |-------------------------------------------------------------------|
      1. | 110      1        1       1        1         1        0         0 |
      2. | 116      1        1       1        1         1        0         0 |
      3. | 116      2        1       1        1         1        0         0 |
      4. | 116      3        1       1        1         1        0         0 |
      5. | 123      1        3       3        1         0        0         1 |
      6. | 123      2        3       3        1         0        0         1 |
      7. | 123      3        3       2        0         0        0         0 |
      8. | 123      4        3       3        1         0        0         1 |
      9. | 123      5        3       2        0         0        0         0 |
     10. | 126      2        3       3        1         0        0         1 |
     11. | 126      3        3       3        1         0        0         1 |
     12. | 126      4        3       3        1         0        0         1 |
     13. | 138      3        3       2        0         0        0         0 |
         +-------------------------------------------------------------------+
    Does anybody know why Stata does not interpret (relimp==relat==1) as equivalent to ( relimp==relat & relat==1), and where this behaviour is explained?


  • #2
    All that parentheses around an expression do is oblige Stata to work out the result of that expression before working out the result of any larger expression in which it is embedded. In no sense can they override the fact that within any expression == remains a binary operator. In other words two == within a parenthesised expression don’t become a joint ternary operator any more than three or more would become a joint n-ary operator, or whatever the jargon is. Precedence rules don’t bite in your example as it’s the same operator, so what counts is left-to-right operation.

    As for where this is explained, that is a good question. I guess a summary that Stata documentation does not explain what makes no sense according to its rules is the best I can do. I can’t remember any previous discussion of this. I don’t think this is a quirk of Stata. I would expect exactly the same behaviour in any similar language.

    I am replying to the last question but the same principles apply without parentheses. Left to right evaluation explains what happens.
    Last edited by Nick Cox; 30 Aug 2020, 05:40.

    Comment


    • #3
      Thank you for engaging, Nick, and I am sorry if I am being thick here, but I did not understand anything of what you said at all.

      Parenthesis have nothing to do here, I was just putting parenthesis around the in-text expressions to separate them from the rest of the text.

      What I do not understand is why
      1) expression one and two below give me different results, or rather, why does expression one do not give me the same result as expression two (which is the behaviour I expected)
      2) what is expression one giving me at all? I have not figured out what expression one is doing yet? For all I can see it is giving me nonsense.

      Code:
      gen dummy1 = relimp==relat==1
      
      gen dummy11 = relimp==relat & relat==1






      Originally posted by Nick Cox View Post
      All that parentheses around an expression do is oblige Stata to work out the result of that expression before working out the result of any larger expression in which it is embedded. In no sense can they override the fact that within any expression == remains a binary operator. In other words two == within a parenthesised expression don’t become a joint ternary operator any more than three or more would become a joint n-ary operator, or whatever the jargon is. Precedence rules don’t bite in your example as it’s the same operator, so what counts is left-to-right operation.

      As for where this is explained, that is a good question. I guess a summary that Stata documentation does not explain what makes no sense according to its rules is the best I can do. I can’t remember any previous discussion of this. I don’t think this is a quirk of Stata. I would expect exactly the same behaviour in any similar language.

      I am replying to the last question but the same principles apply without parentheses. Left to right evaluation explains what happens.

      Comment


      • #4
        2) what is expression one giving me at all?
        Code:
        display 2==2==1
        
        display (2==2)==1
        ref #2
        Left to right evaluation explains what happens.
        Last edited by Bjarte Aagnes; 30 Aug 2020, 07:03.

        Comment


        • #5
          Bjarte Aagnes extracted the best one-line summary from #2 for anyone like Joro Kolev still puzzled. As I flagged myself

          I am replying to the last question
          so it is a little disconcerting to be told that when Joro asked a question using parentheses he didn't mean what he said. Naturally, he was also asking about a version of the same question not using parentheses.

          But let's back up and build on what we all knew at an early age and add a little jargon that may well not have been used at the time. My formal mathematical education ceased at age 17 so I am not fitted even to attempt a rigorous treatment. Any informality or arm-waving here may need to be excused or to be corrected.

          I suspect that Joro really knows almost everything here; it's just towards the end that there may be something different.

          An expression is something you can evaluate -- finding its value -- so

          Code:
          2
          is an expression, evaluated as 2, and

          Code:
          2 + 3
          is another, evaluated as 5, and

          Code:
          2 + log10(1000)
          is another and in your head you can evaluate that as 5 too and

          Code:
          x + y
          is another, which naturally can't be evaluated unless you know x and y. As expressions can get more complex we have minimally

          numbers like 2

          variables like x

          operators like +

          functions like log10()

          and so forth.

          The distinction between operators and functions is conventional. So, functions can have names or at least use some notation that isn't a non-alphabetical symbol. Other way round, there is a clear case that + (say) is a function mapping two arguments to one result.

          But I will try to use the words the way that Stata uses them, while noting in passing that even secondary school mathematics can be messier, as shown by notation like 3! (factorial) and |x| (absolute value of x). There we have operators that in programming would typically (in languages I know about) be implemented as functions.

          The operators we learn early such as + - x (times, or multiplication) and ÷ (division) are often in computing represented as + - * / and these can also be used as binary (dyadic) operators, meaning that

          Code:
          2 + 3, 2 - 3, 2 * 3, 2 / 3
          are examples of expressions in which precisely two objects (operands, if you will) are combined according to the elementary rules you know. Placing the operator between the operands is infix notation and by far the most common kind. Operators can also be unary (monadic) and negation

          Code:
          -2
          is a familiar example. (Incidentally, code in which expressions are negated by multiplying by -1 is often found.) That is an example of prefix notation and 2! for factorial is an example of postfix notation.

          Like many languages Stata has logical operators that go beyond the familiar arithmetic operators. Operators for inequalities are elementary, but your early education may have glossed over the fact that expressions that can be true or false can be evaluated -- as true or false! So

          ==
          !=
          >
          <
          >=
          <=


          are all binary logical operators that can appear in expressions, the results being either true or false. So 6 > 4 is true and 6 < 4 is false. ! is an example of a unary (monadic) logical operator.

          Stata doesn't have special logical constants that might be written as (say) TRUE or FALSE; it is content to return 1 if an expression is true and 0 if an expression is false. That is in turn a convention which has many useful consequences -- not least in being easy to link to the idea of probability.

          What we are taught early on is that long complicated expressions such as

          1 + 2 + 3 + 4

          can be evaluated from left to right and if you prefer from right to left too, and in this case the result should clearly should be the same either way. However, a programming language can't be capricious about working from left to right or from right to left. With nothing else said it uses one method and not the other. Stata works from left to right and that is true of most other languages I have encountered, but there are exceptions, such as APL and J.

          We also learn at an early age that parentheses can be used (a) to clarify and (b) to force a particular order of evaluation, which can be important in mathematics because an expression such as

          3 - 2 / 6

          is ambiguous otherwise. Do you mean (3 - 2) / 6 which is 1/6 or 3 - (2/6) which is 8/3? If you don't parenthesise Stata makes a decision according to a publicised order of evaluation (some say, precedence rules). See

          Code:
          help operator
          for the list. Some programmers are taught to learn the precedence rules of their main language, but I have been programming in Stata for nearly 30 years and not got round to doing that yet. Other advice (better advice in my view) is just to parenthesise aggressively and to break calculations down to a series of stages, so that those who need to understand the code can indeed follow it without too much effort.

          Now let's get to the point!

          Code:
          2 == 2 == 1
          is an expression with 2 and 1 sprinkled as operands (what the operators work on) and == repeated. Precedence rules don't bite here, as it's the same operator repeated. And there are no parentheses. so those are irrelevant too.

          What rules then apply are just (1) the definition of == as a binary operator and (2) left-to-right evaluation. So Stata works exactly as if you had written

          (2 == 2) == 1


          and 2 is equal to 2, so results in true or 1 and then

          (1) == 1
          1 == 1

          1 is equal to 1 so the result is again true or 1.

          Now what Joro wants and expects is that Stata will see the entire expression as the object to be evaluated so that 2 == 2 == 1 is one expression meaning that all three operands are equal -- and, manifestly, they aren't. But for Stata to think that way == repeated would become a ternary operator so that -- as it were -- in that context Stata holds its breath and works on the entire expression

          2 == 2 == 1

          A ternary (triadic) operator, at least as I define it, has three operands and two symbols for operators. They aren't that thick on the ground but I can think of one in Mata (which also appears in other programming languages), and one is enough to make the point:

          Code:
          . mata : (2 > 3) ? "foo" : "bar"
            bar


          Mata needs to look at the entire expression before it can make up its mind about the result.

          Similarly

          Code:
          1 == 2 == 3 == 0
          would by the same kind of Joro argument be treated as one expression to be evaluated all at once. A person might see that as a simple question, are 1, 2, 3 and 0 all equal? to which a smart seven-year-old (*) would tell you No! On the contrary, they are all different. Smart seven-year-olds aside, and older computer users aside too, Stata doesn't think like that. Climb inside Stata and think the way it does

          (1 == 2) == 3 == 0

          (0) == 3 == 0
          0 == 3 == 0
          (0 == 3) == 0
          (0) == 0
          0 == 0

          the result is 1. It's not nonsense; it is just that Stata's logic is not yours. What you want has to be expressed in other ways and you could even program your own command, egen function or Mata function to test whether all arguments are the same.

          Code:
          display (min(0, 1, 2, 3) == max(0, 1, 2, 3))
          is one way to do it in Stata, and no doubt there are others, perhaps even already written. In Mata, you could use similar ideas:

          Code:
          : x = (0, 1, 2, 3)
          
          : min(x) == max(x)
            0
          
          : y = J(4, 1, 42)
          
          : min(y) == max(y)
            1



          I hope that helps.

          (*) No smart seven-year-olds are accessible to me right now, so the assertion is rhetorical.
          Last edited by Nick Cox; 31 Aug 2020, 06:24.

          Comment


          • #6
            I will add to Nick's comprehensive discussion that the Mata example of a ternary operator signals its specialness by using "?" and ":" as the two symbols for the operation, making the parsing straightforward. "a == b == c" and "d == e" use the same operator symbol to two different purposes: as a binary operator and one of a pair of ternary operators. Worse yet, if "a == b == c" is a legitimate expression, then we'd expect "p == q == r == s == t" must be as well. From a purely pragmatic perspective, I can't imagine a parsing algorithm that works for that syntax.

            Another ternary operator, and a useful one, is SAS's support for constructs like "5 < x < 7" and the like where Stata supports inrange(x,5,7).

            Comment


            • #7
              Thank you Nick, it is all clear now.

              It seems that all in this very interesting discussion depends on what people expect, and how what people expect relates to what the software can deliver.

              For example, I expect 2==2==2 to evaluate to True, and 2==2==1 to evaluate to False. Or if one wishes 1 for True and 0 for False.

              Similarly I expect 2==2==2 to evaluate to whatever (2==2==2) evaluates to. So in my original post I used parentheses into a use that Nick himself alluded to--to improve readability and to separate the expression from the rest of the text. The way how I have studied mathematics, parentheses give precedence of the operation over what is outside of the parentheses, and they do not affect in any way what is going on within parentheses.

              Yet it was interesting that Nick thought that somebody might think that enclosing something in parenthesis enforces the two binary operators to be interpreted as one ternary operator. Not a part of standard mathematics, still it is not crazy to think that parentheses should somehow affect what happens inside.

              If I am to summarise what we have learnt here, one cannot construct ternary operator (or more n-nary operators) by simply combining binary operators together.

              Hence we should not be wise about it and "simplify" (A==B & B==C) to (A==B==C).

              Comment

              Working...
              X