Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there any other wrapper/executer/(executioner?) apart from -rangestat- that can run Mata functions?

    Good morning,

    I recently discovered a nice feature of the user contributed -rangestat-: one can use it as a wrapper/executer/(executioner?) of Mata functions. I was wondering whether there is any other wrapper that can do this? Stata's native -statsby- cannot do it.

    Example of what I have in mind is the following code:

    Code:
    . webuse nlswork, clear 
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    .         mata:  
    ------------------------------------------------- mata (type end to exit) --------------------------------
    :             mata clear
    
    :             real rowvector mymean(real colvector X) {
    >                 return(mean(X))
    >             }
    
    :         end 
    ----------------------------------------------------------------------------------------------------------
    
    .  rangestat (mymean) ln_wage, interval(ln_wage . .) by(id) 
    
    .  
    .  egen mean = mean(ln_wage), by(id)
    
    .  
    .  summ mymean mean
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
         mymean1 |     28,534    1.674907    .3780522          0   3.912023
            mean |     28,534    1.674907    .3780522          0   3.912023
    
    .
    I have two issues with the solution above involving -rangesta-

    1. I seem to be misusing the command a bit, it does not seem to be intended for this purpose, as it requires the -, interval()- option, and I am sort of overriding this.

    2. In other tests unreported here, I found this solution involving -rangestat- to be substantially slower than an explicit loop in which the Mata function is embedded, of the sort that Daniel Klein shows in #3, or Leonardo Guizzetti shows in #2 on this thread here: https://www.statalist.org/forums/for...d-of-a-command

    In short, is there no any other wrapper apart from -rangestat- that can run a Mata function by groups? And is there anything that I can do to speed up the execution above using -rangestat-?

  • #2
    In principle any program written for Stata 9 up can (define and) call a Mata function, so the detail is just how you call it up. rangerun and runby (both SSC) share some similar philosophy. More and more functions of egen use Mata and you've looked at that yourself.

    To the genuine question how can I make it run faster? there are empty answers, all too likely to seem flippant, such as get a faster computer or upgrade your Stata to maximise its use of your existing computer or look at the code and rewrite it if you can make it faster. Seriously, I don't know what kind of extra tips you're expecting there.

    Your example misuses rangestat only to the extent that there is already built-in functionality to calculate means. The starting point for rangestat was for solving easy to define but awkward questions such as how many clinic visits in the last 100 days or what is the mean rate over the last 3 years (easy only with regularly spaced series) or what is the average over the entire history to the current observation. The machinery for doing that allowed the same syntax for intervals that are just the distinct values of a variable.

    Some other programs give you a large menu of (e.g.) statistics you can calculate but offer no extensibility. We don't feel embarrassed about not having a very large menu where there is extensibility.

    Comment


    • #3

      Nick, I actually have not seen any good simple programs that integrate Mata in Stata, and I suffered a lot in the last couple of days from this lack of examples, because this is how I learn, through done examples. All the progress I made in the last couple of days was thanks to the examples that Daniel and Leonardo provided on https://www.statalist.org/forums/for...d-of-a-command, and thanks to one presentation by Kit Baum on Mata http://www.ncer.edu.au/events/docume...5S2.slides.pdf.

      If you have on your mind any simple programmes that use Mata in Stata, maybe -egen- functions, maybe some other commands that are not overly complicated, I would appreciate if you point me to those.

      I will look at -rangerun- and -runby- again, but I already looked at them before I wrote this post, and I think they could not execute Mata functions. I think both of those require a Stata programme that returns results in r() or e(). But this was the pain for me to learn how to do, once I managed to write a Stata wrapper over a Mata function that returns results in r(), I could use the native -statsby-.

      About the question how can I make -rangestat- faster, my intuition is that -rangestat- is a sophisticated programme that does a lot of stuff. I was just hoping that somebody might come forth with a more simple programme that just runs Mata functions by groups like -rangestat-, but without the added functionality and faster.

      Or maybe I was hoping that -rangestat- has a secret option, -rangestat, runfaster- that can do the job for me and do so faster :-).

      As to buying better computers and faster Statas, this is not my style. I work on whatever the university gives me for free. I am the kind of guy that likes cheap stuff, best if possible free stuff. Great many thanks to the university that they gave me the laptop they gave me and Stata 15, because I would otherwise be calculating on an abacus. :-)


      If you have
      Originally posted by Nick Cox View Post
      In principle any program written for Stata 9 up can (define and) call a Mata function, so the detail is just how you call it up. rangerun and runby (both SSC) share some similar philosophy. More and more functions of egen use Mata and you've looked at that yourself.

      To the genuine question how can I make it run faster? there are empty answers, all too likely to seem flippant, such as get a faster computer or upgrade your Stata to maximise its use of your existing computer or look at the code and rewrite it if you can make it faster. Seriously, I don't know what kind of extra tips you're expecting there.

      Your example misuses rangestat only to the extent that there is already built-in functionality to calculate means. The starting point for rangestat was for solving easy to define but awkward questions such as how many clinic visits in the last 100 days or what is the mean rate over the last 3 years (easy only with regularly spaced series) or what is the average over the entire history to the current observation. The machinery for doing that allowed the same syntax for intervals that are just the distinct values of a variable.

      Some other programs give you a large menu of (e.g.) statistics you can calculate but offer no extensibility. We don't feel embarrassed about not having a very large menu where there is extensibility.

      Comment


      • #4
        It is a programming trick to make extra hooks that you don't document. Good reasons are that you want it yourself but don't imagine that anyone else will or that you are just too busy to document it. Other reasons exist. But sorry, there is no hidden switch in rangestat to make it go faster.

        I think you're seeking examples of Mata functions called directly from Stata code to do rather serious things. The criteria seem contradictory in that the typical statistics program, as pointed out by John Nelder 50 years ago, is in essence

        GET
        DO
        PUT

        (my copy of the 1972 paper concerned is elusive under lockdown).

        So, a program has to perform all of those, usually, and can't perform them all without several steps. rangestat isn't an exception as the rest of the program does the GET and PUT

        With rangerun and runby the point is that Mata may be used within the program you write.

        Comment


        • #5
          Joro,
          A simple loop like the one below was slower than rangerun based on a particular set of commands, but due to issues with passing locals (that I was having and was not able to overcome) from outside the program command, i am using a variant of the following right now. It may be worth a try to see if this produces satisfactory results in your case.

          local wndw =30
          gen yhat =.
          foreach i of numlist `=`wndw'+1'/`=_N' {
          cap reg y x if _n>=`i'-`wndw' & _n<`i' /* same number of obs i.e _n>=`i'-`wndw' and one step ahead out-of-sample i.e. _n<`i'*/
          cap predict yhattemp if _n==`i'
          cap replace yhat =yhattemp if _n==`i'
          cap drop yhattemp
          }
          Last edited by Oscar Ozfidan; 12 May 2021, 13:59.

          Comment


          • #6
            Tangential point in response to #3, while you can have a program called by -rangerun- or -runby- that returns results in r() or e(), -rangerun- and -runby- ignore and discard those returned results. Information to be retained after -rangerun- or -runby- has to be put into the data set in active memory, typically in the form of new variables, or posted to a data set in another frame.

            Comment


            • #7
              Thank you, Clyde, for the guidance, and thanks to Nick too for encouraging me to try your -runby-.

              I made it work through your -runby- too. It was relatively painless, and the execution is pretty fast.

              I wonder, would it be too hard for you to augment -runby- to accept Mata functions directly like -rangestat- does in my example in #1 ?

              The problem is not that the solution with -runby- is too complicated -- the program I am passing to -runby- is two lines of code. However using -runby- does require that I know how to move between Mata and Stata, and how to pass data and results between Stata and Mata. On the other hand I can directly plug in Mata functions in -rangestat- like in #1, while being absolutely clueless regarding how the interface between Stata and Mata works.

              I think that making -runby- able to operate not only variables, but also on outputs of Mata functions would be a great and very useful expansion of the functionality of -runby-.

              Originally posted by Clyde Schechter View Post
              Tangential point in response to #3, while you can have a program called by -rangerun- or -runby- that returns results in r() or e(), -rangerun- and -runby- ignore and discard those returned results. Information to be retained after -rangerun- or -runby- has to be put into the data set in active memory, typically in the form of new variables, or posted to a data set in another frame.

              Comment


              • #8
                I wonder, would it be too hard for you to augment -runby- to accept Mata functions directly like -rangestat- does in my example in #1 ?
                I know it sounds like a simple modification, but it isn't. It would be a heavy lift. To be honest, it's not even something I would be capable of. Perhaps Robert Picard, who wrote the guts of the program, would know how to do it.

                That said, this functionality doesn't strike me as a good fit for -runby-. -runby- was designed to overcome a particular problem. Stata's -by- prefix is capable of iterating only a single command over groups of observations defined by values of variables. When anything more complicated needed to be iterated, it was necessary to gather levels of the variables and iterate over those with -foreach- loops. The difficulty here is that the code inside those loops had to single out the observations corresponding to the current values of the grouping variables with -if- conditions. And -if- conditions are slow. -runby- implemented a way (actually, two different ways) of getting around the use of -if- conditions in the iteration, and resulted in a marked speed-up for this kind of problem. Its user interface is clean, and users seem to like it. It is not designed to do any particular calculations; just to manage iteraton--it is general purpose.

                I was not involved in the creation of -rangerun- and I don't know anything about how it works, so I don't know what it would take to modify it as Joro Kolev requests. But the same notion of designing a program for a particular purpose and making it as clean and simple as possible, seems to me to be applicable here as well. But, really, it would be Robert Picard's opinion on this that matters.

                Or, evidently, any other member of the Stata community who wants to create a new program that incorporates these capabilities into -runby- or -rangerun- is free to do so: the code for both is public domain.

                Comment


                • #9
                  Originally posted by Joro Kolev View Post
                  However using -runby- does require that I know how to move between Mata and Stata, and how to pass data and results between Stata and Mata.
                  For that, you might be interested in

                  Code:
                  help putmata

                  Comment

                  Working...
                  X