Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The Mata Book: A Book for Serious Programmers and Those Who Want to Be

    Just wanted to extend congratulations to Bill Gould (StataCorp) on completing and publishing his tome on Mata and David Drukker for letting me know that it was available. It is definitely provides much more in-depth exposition of the workings of Mata in ways that may not always be immediately clear in the manual.

  • #2
    The only question that I have at the moment for Bill Gould (StataCorp) is related to some of the programming practices described in section 9.5.1. Why not use this section to advocate for the use of version control systems instead of creating multiple copies of the same source code on a given machine? Version control systems provide a more robust set of features and functionality described in the section (e.g., being able to revert to previous versions seamlessly, switching between development and production code, documenting the logic for changes, etc...), so it just struck me as odd that the first discussion about versioning files went down the road that it did.

    Comment


    • #3
      Originally posted by wbuchanan View Post
      The only question that I have at the moment for Bill Gould (StataCorp) is related to some of the programming practices described in section 9.5.1. Why not use this section to advocate for the use of version control systems instead of creating multiple copies of the same source code on a given machine? [...]
      I agree with you about version-control systems -- they are an excellent tool for any serious programmer. Here at StataCorp we use svn internally, and many of our developers use GitHub for side projects. I didn't want to spend time teaching people how to use svn or GitHub or any other paprticular system. I wanted to keep things simple and assumed that programmers who already knew how to use a formal version-control system would do so. I should, however, have mentioned that in the book.

      Comment


      • #4
        Bill Gould (StataCorp)
        Thanks for the additional info/insight. I definitely can understand the logic behind the decision. Perhaps if there was more integration between Stata and version control systems (e.g., svn, mercurial, cvs, git, etc...) the exposure from the integration would aide in the adoption of these tools among a greater portion of the user base? I would guess that RStudio including some VCS integration in their development environment is part of the reason why using Git and GitHub have exploded in that user community. The book is definitely a great addition in either case; I didn't realize there were single line for loops or a ternary operator until I was looking through the book.

        Comment


        • #5
          Just wanted to second Billy's congratulations; the book looks like an amazing guide! (for sure will be buying it after I'm back from holidays). Also looking forward to the advanced chapters (like #14).

          -Sergio

          Comment


          • #6
            Just starting the chapter about classes and was wondering if there will ever be the possibility of allowing constructors to be overloaded and/or to accept arguments in general? It isn’t a huge difference in keystrokes, but would still be a nice to have feature. It took me a little while, but I ended up figuring out a hacky way of enforcing a singleton pattern for something I started working on recently to avoid the overhead of initialization happening multiple times.

            Comment


            • #7
              Originally posted by wbuchanan View Post
              Just starting the chapter about classes and was wondering if there will ever be the possibility of allowing constructors to be overloaded and/or to accept arguments in general? It isn’t a huge difference in keystrokes, but would still be a nice to have feature. It took me a little while, but I ended up figuring out a hacky way of enforcing a singleton pattern for something I started working on recently to avoid the overhead of initialization happening multiple times.
              You are asking two questions: overloading and arguments.

              Overloading is not allowed, except that classname::new() functions do allow polymorphisms. The class constructor for the superclass should call the constructor for its parent. This is discussed in the chapter on classes. Concerning whether it would ever be allowed, why would you want it?

              Concerning the second question, arguments are not allowed and will not be. Mata does not allow classname::new() to have arguments because Mata's internal memory management routines automatically call it to allocate memory in the setup of the routine being run. That's why later in the book, you see me using classname::init(). My view is that the convenience of automatic memory management more than repays the inconvenience of having to call a separate initializer. The Stata code we write in C invariably has memory leaks in early drafts and we work hard to eliminate them. Mata code never has memory leaks.

              Comment


              • #8
                Bill Gould (StataCorp)
                I'm seeing them as a bit more related and thinking about things more in the context of how Java handles classes. As an example (at least as it relates to arguments), for the Geocode class used by the ggeocode command bundled with jsonio, the constructor method has two different signatures:

                Code:
                public Geocode(String address, String apiKey, Long obid, Map<String, Map<String, Integer>> idx, HashSet<String> retvals)
                
                /* AND */
                
                public Geocode(String address, Long obid, Map<String, Map<String, Integer>> idx, HashSet<String> retvals)
                Similar to the discussion about self threading code, the first constructor method calls all of the subsequent methods needed to handle all of the heavy lifting with constructing the API calls, getting and parsing the results, etc... In the case that a user did not have an API key, the second signature can be called, which ends up passing a default zero-length string to the apiKey parameter when it calls the first constructor. Again it isn't a huge difference between doing that or having a separate method that would do the same thing, but ends up being more of a convenience factor (e.g., the constructor ends up also serving the same role as the ::init() methods).

                Getting back to the overloading question it would be more about modifying the behavior when the class is instantiated. Separate initializer methods can be used of course, but if there was a way to directly overload methods it would avoid having to write a function to handle overloading with transmorphic types manually (e.g., could just write the two different versions of the method that differ based on input type instead of creating subroutines that do that in addition to the method that users would end up calling).

                I hadn't thought too much about the memory leak/management side of things and can definitely appreciate how that would affect things. The singleton pattern that I was mentioning before can be seen here in case other users are interested. Since the constructor ends up reading and parsing a bigger file it was the easiest way that I could think of to prevent new instances of the class from being instantiated multiple times by users.

                Definitely appreciate the book and am hoping that you're planning something for the conference in Columbus this year.

                Comment


                • #9
                  Bill Gould (StataCorp)
                  On a slightly different note, do you have any ideas about whether or not is it possible to define private classes and/or nested classes? I was trying to see if I could implement some programming examples from other languages in Mata, but get a class nested too deeply error when trying to initialize an instance of the class. Here is the relevant portion of the example from https://introcs.cs.princeton.edu/jav...ings.java.html:

                  Code:
                  class Node {
                          private String item;
                          private Node next;
                      }
                  And here is what I tried to define in Mata:

                  Code:
                  : class Node {
                  > public: 
                  > transmorphic scalar datum
                  > class Node scalar next
                  > }
                  
                  : x = Node()
                                    Node():  3305  class nested too deeply
                                   <istmt>:     -  function returned error
                  Not sure if there is a different way to approach defining private classes and/or recursive class definitions, but figure it might be worth it to ask.

                  Comment


                  • #10
                    Originally posted by wbuchanan View Post
                    Bill Gould (StataCorp)
                    Not sure if there is a different way to approach defining private classes and/or recursive class definitions, but figure it might be worth it to ask.
                    I think you can do it with structures. These examples are for linked lists:Also, I haven't checked but structures might have a smaller overhead than classes, so you might want to use structs instead.

                    Comment


                    • #11
                      On a different direction, I would be greatly interested on how to speed up Mata. I already have some notes here (see examples 1-5) but I'm pretty sure there are other tips around.

                      One of the reasons why I care is that it would allow some Stata features to remain competitive with R/Julia. For instance, on this benchmark there are several operations that are 5-10x slower on Stata. Half of the issue is the lack of integer/unint types, but the other half of the issue is that the compiler is not GCC/LLVM (i.e. incredibly smart) so we need to be careful with how we write hot loops.


                      PS: On the benchmark, the results are partly because it sometimes uses strings instead of ints/longs as identifiers (following R convention) which is sub optimal on Stata. However, even without it there remain some large differences.
                      Last edited by Sergio Correia; 08 Mar 2018, 06:22. Reason: typo

                      Comment


                      • #12
                        Originally posted by wbuchanan View Post
                        Bill Gould (StataCorp)
                        I was trying to see if I could implement some programming examples from other languages in Mata, but get a class nested too deeply error when trying to initialize an instance of the class. Here is the relevant portion of the example from https://introcs.cs.princeton.edu/jav...ings.java.html:

                        Code:
                        class Node {
                        private String item;
                        private Node next;
                        }
                        And here is what I tried to define in Mata:

                        Code:
                        : class Node {
                        > public:
                        > transmorphic scalar datum
                        > class Node scalar next
                        > }
                        
                        : x = Node()
                        Node(): 3305 class nested too deeply
                        <istmt>: - function returned error
                        Class Node needs to be defined a little differently in Mata than in Java. You knew that when you made the translation
                        Code:
                        class Node {
                            private:
                                transmorphic scalar dataum
                                class Node scalar   next
                        }
                        This translation did not work. It did not work because you translated the Java code incorrectly. Read the translation literally. The second member of class Node is a scalar. You were explicit about that. A scalar is a 1x1 object. Ergo, it requires memory, the memory necessary to store member variable next. How much is that? Well, its another class Node scalar that itself contains a class Node scalar, and on and on it goes. A class Node scalar requires an infinite amount of memory. When you tried to create a class Node scalar by executing the line x = Node(), Mata followed the literal interpretation and eventually said, "class nested too deeply". It was either that or continue and later say, "out of memory".

                        Java works differently. When you declared next as a class Node in Java, that meant that variable next contains nothing to start, but might be filled in later with another class node. If you filled it in later, the variable next that the nested class would contain nothing to start, and you could fill it in even later. Obviously, you never intended to fill them all in, and so the final object was of finite size.

                        Mata has two ways of dealing with this. The way I prefer is the C way of handling the problem, which is to use pointers. I say that because I come from a C background. The other way to handle it is the Java way. Yes, Mata can do that! Had you translated the Java code to Mata as shown below, it would have worked just as you would have expected!
                        Code:
                        class Node {
                            private:
                                transmorphic scalar dataum
                                class Node matrix   next
                        }
                        Now when you define x = Node(), x.next would contain nothing! Why? Because matrices are 0x0 by default. This also explains to me why you have been asking questions about class function new(). Obviously, some of x.next needs filling in.

                        By the way, the above code could instead be coded
                        Code:
                        class Node {
                            private:
                                transmorphic scalar dataum
                                class Node          next
                        }
                        because matrix is assumed when the orgtype is not specified.

                        I said that I prefer the C way. I do, but I admit there's nothing better about it, it is just a matter of how you are used to thinking. The C way of translating the code is
                        Code:
                        class Node {
                            private:
                                transmorphic scalar                   dataum      
                                pointer(class Node scalar) scalar     next
                        }
                        Pointers are filled in by default with NULL.
                        Last edited by Bill Gould (StataCorp); 12 Mar 2018, 10:13.

                        Comment


                        • #13
                          Bill Gould (StataCorp)
                          Thanks for the clarification. I guess I was thinking that the scalar would be initialized as a 1x1 matrix and that it would lead to the initialization that you mentioned for matrices above. I've started to use pointers a bit more in cases where I've been a bit more concerned about memory management, but like you mentioned in this particular case it doesn't seem that there would be a major benefit one way or the other. The only minor benefit I could see is saving a few key strokes by testing whether the retrieved value was NULL vs J(0, 0, .) and potentially a nominal amount of memory that wouldn't be allocated in the case of the pointer.

                          In either case, I imagine between the book and discussion here on Statalist there will be a wealth of knowledge about using Mata that did not previously exist.

                          Comment

                          Working...
                          X