Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mata compiler ignores eltypes / orgtypes for variable declarations

    Hi,

    I was surprised to see that the Mata compiler seems to ignore much of the eltype and orgtype specifications when explicitly declaring variables inside functions. [Note: This is not the case for function return types and function argument types. Here the compiler works as I would have expected.]

    I've tested the following in Stata 13 to 16.

    The following runs through:

    Code:
    capture noi mata mata drop noerror()
    mata:
    
    void noerror() {
     
     `" --- ASSIGN TO REAL SCALAR --- "'
     real scalar rs
     
     `" --- rs = (1\2\3) --- "'
     rs = (1\2\3)
     rs
     eltype(rs)
     orgtype(rs)
     
     `" --- rs = J(3,3,3) --- "'
     rs = J(3,3,3)
     rs
     eltype(rs)
     orgtype(rs)
    
     `" --- rs = J(3,3,"def") --- "'
     rs = J(3,3,"def")
     rs
     eltype(rs)
     orgtype(rs)
     
     `" --- ASSIGN TO STRING SCALAR --- "'
     string scalar ss
     
     `" --- ss = J(3,3,"def") --- "'
     ss = J(3,3,"def")
     ss
     eltype(ss)
     orgtype(ss)
     
     `" --- ss = J(3,3,3) --- "'
     ss = J(3,3,3)
     ss
     eltype(ss)
     orgtype(ss)
     
    }
    
    noerror()
    
    end

    The output is:


    Code:
    : noerror()
       --- ASSIGN TO REAL SCALAR ---
       --- rs = (1\2\3) ---
           1
        +-----+
      1 |  1  |
      2 |  2  |
      3 |  3  |
        +-----+
      real
      colvector
       --- rs = J(3,3,3) ---
    [symmetric]
           1   2   3
        +-------------+
      1 |  3          |
      2 |  3   3      |
      3 |  3   3   3  |
        +-------------+
      real
      matrix
       --- rs = J(3,3,"def") ---
    [symmetric]
             1     2     3
        +-------------------+
      1 |  def              |
      2 |  def   def        |
      3 |  def   def   def  |
        +-------------------+
      string
      matrix
       --- ASSIGN TO STRING SCALAR ---
       --- ss = J(3,3,"def") ---
    [symmetric]
             1     2     3
        +-------------------+
      1 |  def              |
      2 |  def   def        |
      3 |  def   def   def  |
        +-------------------+
      string
      matrix
       --- ss = J(3,3,3) ---
    [symmetric]
           1   2   3
        +-------------+
      1 |  3          |
      2 |  3   3      |
      3 |  3   3   3  |
        +-------------+
      real
      matrix
    The following two functions generate compiler errors:
    Code:
    capture noi mata mata drop error1()
    mata:
    void error1() {
     string scalar s2
     s2 = 3
    }
    end
    with output:

    Code:
    . mata:
    ------------------------------------------------- mata (type end to exit) -----
    : void error1() {
    >         string scalar s2
    >         s2 = 3
    type mismatch:  string = real not allowed
    (1 line skipped)
    -------------------------------------------------------------------------------
    r(3000);
    Code:
    capture noi mata mata drop error2()
    mata:
    void error2() {
     real scalar r1
     r1 = "def"
    }
    end
    with output:

    Code:
    . mata:
    ------------------------------------------------- mata (type end to exit) -----
    : void error2() {
    >         real scalar r1
    >         r1 = "def"
    type mismatch:  real = string not allowed
    (1 line skipped)
    -------------------------------------------------------------------------------
    r(3000);
    So it seems that the only thing that the compiler checks are assignments of string scalars to numeric scalars and vice versa.

    Unfortunately, the documentation is not clear on this. I haven't found any clear statement anywhere about whether the above code that runs through should generate or should not generate an error.

    Mata's strong type-checking features are wonderful and it would come as a bit of a disappointment if the only benefit of eltype/orgtype variable declarations inside functions were the readability of code. If that is the case, it would be very helpful to have it clearly stated in the manuals.

    Any help or comment on this matter is greatly appreciated. Please let me know if I am overlooking or misstating something.

    Best,
    Daniel



  • #2
    Also surprised about this. Don't have a copy of e.g. Stata 14 on hand, but I wonder if this is new or has always been a part of Mata.

    Comment


    • #3
      The sample code in #1 runs without error in Stata 11.2 (IC) and 12.1 (IC).

      I believe declaring variables inside of functions only affect how the variables are initiated. I would also like to see/expect different behavior.

      Best
      Daniel

      Comment


      • #4
        Thanks Sergio and Daniel for looking at this.

        Comment


        • #5
          A search of the help whatsnew files found no mention of this apparent change in behavior between Stata 12 and Stata 13. But with that said, some further thoughts based on my experimentation.

          I will note that the examples in post #1 involving the J() function cannot be checked by the compiler, because J() is a transmorphic function that can return real, complex, string, or pointer matrices. That would require runtime checking, and it is clear that Stata does not do runtime checking.

          Because Mata does not necessarily have access at compile time to every definition, Mata (correctly, I would argue) declines to type check those built-in functions whose type is less general than transmorphic.
          Code:
          : void function demo() {
          > string scalar s
          > s = "1"
          > s
          > eltype(s)
          > orgtype(s)
          > s = isdiagonal(J(2,2,2))
          > s
          > eltype(s)
          > orgtype(s)
          > }
          
          : 
          : demo()
            1
            string
            scalar
            0
            real
            scalar
          Note that this example demonstrates both the lack of compile time checking of the assignment expression, and the lack of runtime checking as well.

          As an aside, it is worth noting that a 1x1 matrix assignment is identified by orgtype() as a scalar.
          Code:
          : rs2 = (1,1)
          
          : orgtype(rs2)
            rowvector
          
          : rs2 = J(1,1,1)
          
          : orgtype(rs2)
            scalar
          
          : rs2 = J(0,1,1)
          
          : orgtype(rs2)
            colvector
          With that said, replacing the J() function in some parts of post #1 causes the Mata compiler to display the expected behavior.
          Code:
          : void demo() {
          >  real scalar rs5
          >  rs5 = ("5","5","5")
          type mismatch:  real = string not allowed
          (1 line skipped)
          It is unfortunate that help m2_declarations explains the argument in favor of using inside declarations:

          By including the inside declaration, we have told Mata what variables we will need and how we will be using them. Mata can do two things with that information: first, it can make sure that we are using the variables correctly (making debugging easier again), and second, Mata can produce more efficient code (making our function run faster).
          which does not appear to be the case. Perhaps it originally did, but as it was extended over time found it could no longer do so, but the documentation has not kept up.

          Comment


          • #6
            Thanks a lot William for your insights. I was confused when drawing up an example using J(), whose declaration has a transmorphic return type and, as you pointed out, is therefore of no use to the compiler in terms of type-checking. I guess I was too taken aback by the fact that the eltype/orgtype part of (inside function) variable declarations carried much less meaning than I was certain it would.

            Thanks also for making it clear to me that there is no runtime type-checking inside function bodies in Mata (or very little type-checking, the extent of which is not clear to me) and how difficult such a thing for Mata would be. Maybe one useful point to reiterate in that context is that function return values and function arguments are checked at runtime. From m2_declarations:
            For function return values:
            Code:
            [...] Mata will verify that the function really is returning a real matrix and, if it is not, abort execution.
            And for function arguments:
            Code:
            We have also told Mata what to expect and, if some other program attempts to use our function incorrectly, Mata will stop execution.
            In my own work I have never seen a case where the above was not true. So that is something.

            Beyond that, I guess we cannot ask for much. I still cringe when I read the lines from your first example in #5 that do not give a compile-time error:
            Code:
            string scalar s
            s = isdiagonal(J(2,2,2))
            The only thing that seems to be type-checked in assignments are string/numeric literals or compositions thereof.

            You write in your post
            Because Mata does not necessarily have access at compile time to every definition...
            Could you explain what you mean by that?


            Comment


            • #7
              In

              Because Mata does not necessarily have access at compile time to every definition
              I meant to write "every function definition". My point was that functions need not be defined at the time the calling function is compiled, and even Stata's non-built-in functions (those defined in .mata files) can be redefined. So the compiler could in theory check the assignments for the results of built-in functions (which apparently cannot be redefined), but all other functions are unknown or changeable, and thus assignments of their return values cannot be checked by the compiler. See the two examples below.

              So rather than do an incomplete and thus misleading job by compile-time checking some but not all function return assignments, the complier does no such checking. Of course, this discussion arouse because in doing compile-time checking of constant assignments, the compiler is doing an incomplete and thus misleading job of checking some but not all assignments ... .
              Code:
              : void function demo() {
              > real scalar rs
              > rs = 42
              > rs
              > eltype(rs)
              > orgtype(rs)
              > rs = gnxl()
              > rs
              > eltype(rs)
              > orgtype(rs)
              > }
              
              : 
              : demo()
                                demo():  3499  gnxl() not found
                               <istmt>:     -  function returned error
              r(3499);
              
              : 
              : string function gnxl() {
              > return("GNXL")
              > }
              
              : 
              : demo()
                42
                real
                scalar
                GNXL
                string
                scalar
              
              :
              Code:
              : void function demo() {
              > real matrix rs
              > rs = (1,2\3,4)
              > rs
              > eltype(rs)
              > orgtype(rs)
              > rs = diag((1,1))
              > rs
              > eltype(rs)
              > orgtype(rs)
              > }
              
              : 
              : demo()
                     1   2
                  +---------+
                1 |  1   2  |
                2 |  3   4  |
                  +---------+
                real
                matrix
              [symmetric]
                     1   2
                  +---------+
                1 |  1      |
                2 |  0   1  |
                  +---------+
                real
                matrix
              
              : 
              : mata drop diag()
              
              : string matrix function diag(real vector x) {
              > return(("foo",""\"","bar"))
              > }
              note: argument x unused
              
              : 
              : demo()
                     1   2
                  +---------+
                1 |  1   2  |
                2 |  3   4  |
                  +---------+
                real
                matrix
              [symmetric]
                       1     2
                  +-------------+
                1 |  foo        |
                2 |        bar  |
                  +-------------+
                string
                matrix
              
              :

              Comment


              • #8
                Thanks again William for your elaborations. They were very helpful. I came across the "function not found" error hundreds of times of course but never thought about it in terms of what it means for the compiler. Think I know Mata a little better now.

                Comment


                • #9
                  Type declaration in Mata is first and foremost for generating more efficient code. There are few places that Mata performs runtime type checking, one of the places is function argument types are checked at run time.

                  For the type checking at compile-time, the main issue is that Mata allows transmorphic eltype, which by definition is that it can be any other type (real, complex, string, or pointer). Type checking in the following code almost has to be a run-time operation:

                  Code:
                  real scalar a
                  transmorphic scalar b
                  
                  a = b
                  Also, Mata function's return type is almost always assumed as transmorphic since Mata does not support function forward declaration. A function can be not defined at all when other functions calling it are compiled.

                  We may consider adding more compile-time checking in future for the situations where type can be determined, for example:

                  Code:
                  real scalar a
                  transmorphic scalar b
                  
                  b = "this is astring"
                  // or b = 1+2i
                  a = b
                  In this case, transmorphic scalar b is already taken a literal and its type is known when compiling a = b, a compile error seems appropriate, maybe when -mata set matastrict- is on if we are afraid breaking too many existing programs.

                  For the general case of a = b, where a is a concrete type and b is transmorphic, we will take a look to see if adding an optional run-time checking in future is possible, i.e, not slowing things down.

                  In the meantime, if you must type check at certain places of the code, something like the following might work:

                  Code:
                  mata:
                  void foo(transmorphic scalar b)
                  {
                      real scalar a
                      
                      a = b
                      mustberealscalar(a)
                  }
                  
                  void mustberealscalar(real scalar a)
                  {
                      return
                  }
                  
                  foo(1+2i)
                  end





                  Comment


                  • #10
                    Hua, thank you for addressing this and for your additional explanations on the Mata compilation process.

                    To summarize and restate the main problem: I was under the impression that I could safely rely on Mata's el/orgtype declarations in the following sense: For example, when declaring
                    Code:
                    real scalar i
                    inside a function body, I thought that I could trust that i would never ever hold anything else than a real scalar, and if an assignment to i violated this condition, Mata would give me an error message at some point, and oftentimes at compile-time already. From our discussion in this thread I learned that Mata errors in this context are actually very rarely the case (and I also learned the reasons why it is not implemented more tightly).

                    Knowing this is important to me because it has implications about how I go about writing and debugging my Mata code. For example, if orgtypes were enforced strictly, I would have more confidence in a complicated matrix expression that I write. Now I will look at it with more suspicion.

                    While additional type-checking features in Mata would be great, I probably would not put them at the top of my Mata-enhancements wish list. For me and for now, it is enough to have an understanding of what exactly the benefits of el/orgtype declarations are. Still, I think that making this clearer in the documentation with a few additional sentences or paragraphs would serve many users.

                    Comment


                    • #11
                      I agree with the assessment from Daniel Schneider in post #10. I would add that the paragraph I quoted in post #5 would be a good place to make the value of el/org "inside" declarations clearer. It should be rewritten to replace the "first, it can make sure that we are using the variables correctly (making debugging easier again)" with a much more qualified statement along the lines of what Hua Peng (StataCorp) wrote in post #9, and probably demoting the explanation from "first" to "second".

                      As it now stands, that paragraph misleads the reader with unrealistic expectations of what can be accomplished with inside declarations.

                      I will also add I have not been able to find any mention - neither in [M2] op assignment nor elsewhere in the Mata documentation - about the assignment operator's power to change the el/org type of of an lval that already exists or has been declared. A note in that documentation would be helpful.

                      Comment


                      • #12
                        Thanks for the suggestions, we will consider making some additions/changes in Mata documentation.

                        Comment

                        Working...
                        X