No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mata debugging strategies

    Might anyone suggest helpful references on strategies for Mata debugging?

  • #2
    John, I feel your pain <grin>. I find debugging Mata more difficult than other languages, so this is a great topic. I don't have any references, but what about just sharing our own strategies? I hope others have better ideas than my own pedestrian practices.


    • #3
      Thanks Mike.

      I'm embarrassed to admit that right after I posted this query I discovered matalnum (which I can't believe I didn't know after so many years of working with Mata).
      help matalnum
      which by default is set off. This looks like it'll be helpful for some purposes but I'd still be keen to learn about others' general strategies.


      • #4
        My recollection is that the line numbering scheme for matalnum is sufficiently obscure that I prefer to just sprinkle "spot 1" "spot2" ... "spotN" statements throughout my code. Also, I find that rather than using -printf()- to echo values in a debugging situation, I like to do this:
        strofreal(x) + " " + strofreal(y) + " " strofreal(z)


        • #5
          This will make me sound cocky and look like a smartass ... While I agree that debugging in Mata can be (well, it is) a pain and there should be tools to assist in that, I would also like to note that having to debug more than, say, 20-something lines of code might indicate bad programming style. Good programming style includes writing and calling (many) subroutines, or so I have been told. Such subroutines typically do not span much more than about 20 lines of code (look at the code of rename or adoupdate to get the idea). Good programming style also includes testing subroutines separately from the main routine, or so I have been told. Following this line of thought, there should never be more than a couple of lines of code to be debugged. It then appears that the basic strategy suggest by Gould (2010, 62) (start at slide 56) would do just fine.


          • #6
            I also tend to use the "sprinkling" approach Mike discusses in #4 (note also Gould's slide #63), as well as strofreal to echo values. I also see no obvious reason not to use matalnum along with sprinkling, however.

            E.g. if between "spot 2" and "spot 3" there are lots of lines involving multiplication but a conformability error involving " * " then the line number ID'd by matalnum might speed up debugging. Of course in the limit with "spot x" sprinkled in after every line of code then there'd be no separate need for line numbers to be ID'd with matalnum.

            All that said, if I was the programmer that I'd like to be instead of the programmer that I am then what Daniel proposes in #5 makes a lot of sense.


            • #7
              The advice from daniel klein is good.
              My approach to programming goes even further.
              I design (preferably) simple classes that keep and transform data.
              If the classes are complex, I might add methods/functions for validation purposes.
              Then I write code to valuate the classes, ie a sort of unit testing.
              With the functionality of the classes, it is often quite easy to write a Stata wrapper whos main purpose is to validate inputs.

              Try at least Daniel's advice, it is not as hard as it looks
              I'm afraid this sounds cocky as well, sorry
              Good luck
              Kind regards



              • #8
                Splitting up your program into sub programs is a great idea and I usually save them in an additional ado file so more than one program (or ado) can access those. What I find hard with many small programs is to keep an overview which programs I have in a ado file. Unfortunately Stata's do file editor does not list all functions.


                • #9
                  My take on that is that if you want to make Mata function more generally available, so it can be used in multiple .ado files, then it needs its own help-file. I would distribute this as a Mata library with multiple help-files, one overview help-file that lists all the functions it contains with links to the helpfiles for those functions. I would also include a .do file (with extension .mata) containing the source code as a ancillary file. So if I forgot what function is where I would only have to look a the overview help-files of the limited number of libraries I created.
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz


                  • #10
                    Definitely use "mata set matalnum on". Then turn it back off when you're done so as not do impede the compile optimization ("mata set mataoptimize off").

                    The line numbers in the crashes when matalnum is on are not obscure. They just refer to lines in your source code routine, with 1 = the first line of the function. Exception: multiple lines commented out with a single /* */ are counted as 1 line. So avoid them. When working intensively with one function insert blank lines before it so its first line is, say 501. Then line 37 in your crash output is line 537 in the editor.

                    Always use "mata set matastrict on": force yourself to declare all variables. Favor stricter typing, such as real rather than numeric.

                    Pay attention to compile-time warnings such as "assigned but not used" or "declared but not used" or "used before assignment".

                    You don't need printf() to dump output. Just list the things you want printed on a blank line. Precede that line with a label. E.g. if you want the dimensions of X, do:
                    If a complex expression is causing a crash and you can't quite tell where, build it up in dump statements like this, and see how far it gets before it crashes:
                    "(tlambdalnu :+ K)[ot,]"
                    (tlambdalnu :+ K)[ot,]
                    "(tlambdalnu :+ K)[ot,] + phi[ox,] + phi[oy,]"
                    (tlambdalnu :+ K)[ot,] + phi[ox,] + phi[oy,]
                    "Re(asdfLogRowSumExpC((lnp0 , lnm + asdfLogRowSumExpC((tlambdalnu :+ K)[ot,] + phi[ox,] + phi[oy,]))))"
                     Re(asdfLogRowSumExpC((lnp0 , lnm + asdfLogRowSumExpC((tlambdalnu :+ K)[ot,] + phi[ox,] + phi[oy,]))))
                    Occasionally it's worth declaring variables as external, and saving objects to them so that you can examine them after the run. You can use "X=X, Y" to accumulate copies of multiple instances of an object if the routine in question is being called many times. You have to initialize X each time though.

                    If the problem is not causing a crash, but subtler, like numerical stability, you can dump or save output, copy it into excel, clean it up, and make graphs.

                    To study likelihood evaluators, start an ml estimation, interrupt it, then us ml plot to call the evaluator while varying one parameter, with values control. (Syntax is ml plot [param] [start] [stop] [steps].) E.g., in diagnosing instability in a likelihood evaluators, I have sometimes dumped the computed observation-level log likelihoods for four trial parameter vectors, arranged them in Excel as an N x 4 grid, and then generated sparkline graphs along the right edge to find anomalous behavior at the observation level.

                    Other graphs of such accumulated output or saved values, such as line and scatter graphs, can also be useful in identifying anomalies.

                    Almost always, you need to be able to drill down to see exactly what is happening, then challenge yourself with the question of whether it is happening the way it should.

                    Sure would be nice if there were an IDE though...
                    Last edited by David Roodman; 13 Nov 2020, 12:40.


                    • #11
                      And whenever you run into a bug, try to find the simplest situation that creates it. If it crashes on a 500x200 matrix, see if you can make it crash on a 5x2 matrix. If passing some list of 10 variables crashes it, see if you can strip it down to two. Then do the same in the vertical direction, restricting the sample. Smaller examples are easier to debug.