Is there a way to map/ apply a list to a program?

Chun Kit Dai

Join Date: Mar 2023

Posts: 6
#1

Is there a way to map/ apply a list to a program?

19 Sep 2023, 10:54

Hi STATALIST,

I want to map a local list to a program instead of doing for loop. Please see an example below?
Ideally, I will get a list of results back. Thank you in advance

Code:

local nums "1 2 3 4" program squarenum, rclass args num return local numsq `num'^2 end map nums squarenum // here I don't know what's the way to do it

I want to avoid for loop as much as possible

Last edited by Chun Kit Dai; 19 Sep 2023, 10:57.
Tags: None

daniel klein

Join Date: Mar 2014
Posts: 3885

19 Sep 2023, 11:45

Stata does not work that way; Mata does:

Code:

. mata
------------------------------------------------- mata (type end to exit) ----------------------------------------------------------------------------------------------------------------------
: real rowvector squarenum(real rowvector nums) return(nums:^2)

: squarenum((1,2,3,4))
        1    2    3    4
    +---------------------+
  1 |   1    4    9   16  |
    +---------------------+

I imagine that at some level there is a loop here, too.*

Anyway, in Stata, you either need an explicit loop; inside the program or outside of it, or you need recursive programming:

Code:

program squarenum , rclass
    
    version 17
    
    gettoken num 0 : 0
    
    if (`"`0'"' != "") {
        
        squarenum `0'
        
    }
    
    return local numsq `=`num'^2' `r(numsq)'
    
end

Code:

. squarenum 1 2 3 4

. return list

macros:
              r(numsq) : "1 4 9 16"

* Edit: According to Wikipedia, Mata is indeed vectorized, so there is no loop at any level here.

Last edited by daniel klein; 19 Sep 2023, 12:15.

Comment

daniel klein

Join Date: Mar 2014

Posts: 3885
#3

19 Sep 2023, 11:58

I can see that you are coming from R. I imagine you are trying to do something in Stata the way you would in R. Don't. Instead of asking for advice on pieces of code in several threads, and trying to replicate your R-style functions, it is probably better to explain your ultimate goal in one thread and ask for advice on implementing that in Stata from scratch. There is a good chance that the Stata approach is fundamentally different from the R approach.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35775
#4

19 Sep 2023, 12:27

See matmap from SSC for an example of how this was once done in 2000 before foreach and forvalues loops and Mata.

The syntax could be modernised, but otherwise my advice resembles Daniel's: use loops or Mata (or loops in Mata).
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 822
#5

19 Sep 2023, 13:33

There is a good chance that the Stata approach is fundamentally different from the R approach.

The core issue is that R and Stata come from two different programing language paradigms: R is a functional language whereas Stata is an imperative language. Trying to do things the R way in Stata is a bit like applying the rules of romance languages to Mandarin. map() (as in the JavaScript or Python function) and apply() (as in the R function) come at the problem of iteration from a functional "lambda calculous" perspective where iteration with a for loop is undefined, and iteration should be done by recursion through a list. In reality, last I checked (more than 5 years ago) R's implementation of apply() was actually backed by a for loop, but it doesn't matter because these two techniques for going through items in a list are both (obviously) in O(N) time. I mention this because R users sometimes have a misconception that apply is "faster" or "more efficient" than a for loop. Apply can be faster in R, but this is really because R is memory inefficient and apply tends to rely on more C code when it iterates through a vector than an equivalent for loop.

For loops are well defined under the Turing machine paradigm, which tends to better capture the low-level machine logic and which inspires most imperative languages. The Church-Turing thesis contains a proof that Turing machines and lambda calculous are mathematically equivalent. It follows that every recursive solution has an equivalent iterative solution, even if the equivalent solution isn't obvious or isn't as efficient (recursion is often memory inefficient). I routinely write lapply() function calls in R that are no easier to read nor more efficient than a for loop. It's just more idiomatic, not necessarily better, faster, nor more memory efficient.

Edit: According to Wikipedia, Mata is indeed vectorized, so there is no loop at any level here.

I think the Wikipedia article may be a bit misleading. My understanding is that as a user of Mata (or Stata for many operations on variables) the operation is vectorized, so as far as the user is concerned the implementation contains no loops. However, at a low level there are exactly three ways to go through a list and preform an operation on each element: Iteration, recursion, or through some multithreading operation. I think the logic here is intuitive: you need a way to "touch" every element of the list. You can do that by iterating through the list one by one, recusing through the list, or by sending each element of the list down its own "channel" for processing.

Multithreading is theoretically great for problems expressible in terms of linear algebra (I mean matrix multiplications and the like, not necessarily things like finding the solution to a system of equations). As a practical matter there can be a lot of overhead to multithreading applications. Multithreading doesn't work at all when the solution on the current iteration depends on the solution to the previous iteration because calculations are preformed concurrently. This is all to say that most of the vectorized operations we work with are almost certainly implemented iteratively at a low level with a goto in assembly (a for loop is really a kind of goto). Compilers are designed to write highly optimized assembly, and a "goto" iterative solution is as fast and memory efficient as the machine can be on a single core. Some of the more pricy Stata implementations may do some multithreading on a GPU, but there may still be some iteration involved, especially if the number of operations needed are large enough to exceed the number of cores available. R has some packages for multithreading, but it doesn't implement multithreading at a low level, despite the fact that it is a vectorized language.

Last edited by Daniel Schaefer; 19 Sep 2023, 13:37.
3 likes
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 822
#6

19 Sep 2023, 14:02

Sorry, I know #5 is probably more detail than anyone wants or needs here. tl;dr: R and Stata follow two different programing language paradigms, which is why it can be difficult to translate directly from one to the other. Also, even vectorized operations probably do some kind of iteration at a low level.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3885
#7

19 Sep 2023, 15:15

Originally posted by Daniel Schaefer View Post

Sorry, I know #5 is probably more detail than anyone wants or needs here.

On the contrary. Thanks for the insights.
1 like
Comment
Chun Kit Dai

Join Date: Mar 2023

Posts: 6
#8

19 Sep 2023, 15:35

Thank you so much, Daniel and Nick. Your comments helped me to understand some of my frustrations in working with STATA so far. Thank you for the insights
Comment

Announcement

Is there a way to map/ apply a list to a program?

Comment

Comment

Comment

Comment

Comment

Comment

Comment