Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there a way to map/ apply a list to a program?

    Hi STATALIST,

    I want to map a local list to a program instead of doing for loop. Please see an example below?
    Ideally, I will get a list of results back. Thank you in advance

    Code:
    local nums "1 2 3 4"
    
    program squarenum, rclass
        args num
    
        return local numsq `num'^2
        end
    
    map nums squarenum // here I don't know what's the way to do it
    I want to avoid for loop as much as possible
    Last edited by Chun Kit Dai; 19 Sep 2023, 10:57.

  • #2
    Stata does not work that way; Mata does:

    Code:
    . mata
    ------------------------------------------------- mata (type end to exit) ----------------------------------------------------------------------------------------------------------------------
    : real rowvector squarenum(real rowvector nums) return(nums:^2)
    
    : squarenum((1,2,3,4))
            1    2    3    4
        +---------------------+
      1 |   1    4    9   16  |
        +---------------------+
    I imagine that at some level there is a loop here, too.*

    Anyway, in Stata, you either need an explicit loop; inside the program or outside of it, or you need recursive programming:

    Code:
    program squarenum , rclass
        
        version 17
        
        gettoken num 0 : 0
        
        if (`"`0'"' != "") {
            
            squarenum `0'
            
        }
        
        return local numsq `=`num'^2' `r(numsq)'
        
    end
    Code:
    . squarenum 1 2 3 4
    
    . return list
    
    macros:
                  r(numsq) : "1 4 9 16"

    * Edit: According to Wikipedia, Mata is indeed vectorized, so there is no loop at any level here.
    Last edited by daniel klein; 19 Sep 2023, 12:15.

    Comment


    • #3
      I can see that you are coming from R. I imagine you are trying to do something in Stata the way you would in R. Don't. Instead of asking for advice on pieces of code in several threads, and trying to replicate your R-style functions, it is probably better to explain your ultimate goal in one thread and ask for advice on implementing that in Stata from scratch. There is a good chance that the Stata approach is fundamentally different from the R approach.

      Comment


      • #4
        See matmap from SSC for an example of how this was once done in 2000 before foreach and forvalues loops and Mata.


        The syntax could be modernised, but otherwise my advice resembles Daniel's: use loops or Mata (or loops in Mata).

        Comment


        • #5
          There is a good chance that the Stata approach is fundamentally different from the R approach.
          The core issue is that R and Stata come from two different programing language paradigms: R is a functional language whereas Stata is an imperative language. Trying to do things the R way in Stata is a bit like applying the rules of romance languages to Mandarin. map() (as in the JavaScript or Python function) and apply() (as in the R function) come at the problem of iteration from a functional "lambda calculous" perspective where iteration with a for loop is undefined, and iteration should be done by recursion through a list. In reality, last I checked (more than 5 years ago) R's implementation of apply() was actually backed by a for loop, but it doesn't matter because these two techniques for going through items in a list are both (obviously) in O(N) time. I mention this because R users sometimes have a misconception that apply is "faster" or "more efficient" than a for loop. Apply can be faster in R, but this is really because R is memory inefficient and apply tends to rely on more C code when it iterates through a vector than an equivalent for loop.

          For loops are well defined under the Turing machine paradigm, which tends to better capture the low-level machine logic and which inspires most imperative languages. The Church-Turing thesis contains a proof that Turing machines and lambda calculous are mathematically equivalent. It follows that every recursive solution has an equivalent iterative solution, even if the equivalent solution isn't obvious or isn't as efficient (recursion is often memory inefficient). I routinely write lapply() function calls in R that are no easier to read nor more efficient than a for loop. It's just more idiomatic, not necessarily better, faster, nor more memory efficient.

          Edit: According to Wikipedia, Mata is indeed vectorized, so there is no loop at any level here.
          I think the Wikipedia article may be a bit misleading. My understanding is that as a user of Mata (or Stata for many operations on variables) the operation is vectorized, so as far as the user is concerned the implementation contains no loops. However, at a low level there are exactly three ways to go through a list and preform an operation on each element: Iteration, recursion, or through some multithreading operation. I think the logic here is intuitive: you need a way to "touch" every element of the list. You can do that by iterating through the list one by one, recusing through the list, or by sending each element of the list down its own "channel" for processing.

          Multithreading is theoretically great for problems expressible in terms of linear algebra (I mean matrix multiplications and the like, not necessarily things like finding the solution to a system of equations). As a practical matter there can be a lot of overhead to multithreading applications. Multithreading doesn't work at all when the solution on the current iteration depends on the solution to the previous iteration because calculations are preformed concurrently. This is all to say that most of the vectorized operations we work with are almost certainly implemented iteratively at a low level with a goto in assembly (a for loop is really a kind of goto). Compilers are designed to write highly optimized assembly, and a "goto" iterative solution is as fast and memory efficient as the machine can be on a single core. Some of the more pricy Stata implementations may do some multithreading on a GPU, but there may still be some iteration involved, especially if the number of operations needed are large enough to exceed the number of cores available. R has some packages for multithreading, but it doesn't implement multithreading at a low level, despite the fact that it is a vectorized language.
          Last edited by Daniel Schaefer; 19 Sep 2023, 13:37.

          Comment


          • #6
            Sorry, I know #5 is probably more detail than anyone wants or needs here. tl;dr: R and Stata follow two different programing language paradigms, which is why it can be difficult to translate directly from one to the other. Also, even vectorized operations probably do some kind of iteration at a low level.

            Comment


            • #7
              Originally posted by Daniel Schaefer View Post
              Sorry, I know #5 is probably more detail than anyone wants or needs here.
              On the contrary. Thanks for the insights.

              Comment


              • #8
                Thank you so much, Daniel and Nick. Your comments helped me to understand some of my frustrations in working with STATA so far. Thank you for the insights

                Comment

                Working...
                X