Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to tokenize a stored result macro

    Dear Stata listers,

    After running the user community contributed command uirt, like:
    Code:
    use alike, clear
    qui uirt v*
    di "`e(depvar)'"
    * that results in:
    v1 v2 v3 v4 v5 v6 v7 v8
    I tried to tokenize the result macro e(depvar) that holds the names of dependent variables (items) separated by a space character, using:
    Code:
    tokenize e(depvar), parse()
    dis `1'
    v1 v2 v3 v4 v5 v6 v7 v8
    which is not what I expect, i.e. the first (tokenized) word: v1
    The help file states:
    If parse() is not specified, parse(" ") is assumed, and string is split into words.
    So, I assume that my line of code is technically correct (using parse(" ") also produces the whole string of words).
    Nevertheless, the whole string is replicated by the first token and not the first word.

    Can somebody explain what I am doing wrong here, or, if a coding alternative could produce from the stored result macro such items step by step (using a loop)?
    http://publicationslist.org/eric.melse

  • #2
    I think you just need
    Code:
    tokenize `e(depvar)', parse()
    so that you can get
    Code:
    . dis "`1'"
    v1
    Last edited by Hemanshu Kumar; 17 May 2025, 12:52.

    Comment


    • #3
      Here's what happens: tokenize works on string literals. "e(depvar)" (quotes added for emphasis) is just one word; hence `1' evaluates to "e(depvar)". display evaluates its arguments; hence e(depvar) evaluates to v1 v2 ...

      You probably want
      Code:
      tokenize `e(depvar)'
      Edit: crossed with #2 which cuts straight to the solution.
      Last edited by daniel klein; 17 May 2025, 12:54.

      Comment


      • #4
        Dear Hemanshu & Daniel,

        Thank you for your advise. I ran each suggestion but the result is a rather mysterious digit 2 instead of v1, like:
        Code:
        . tokenize `e(depvar)'
        . dis `1'
        2
        * and
        . tokenize `e(depvar)', parse()
        . dis `1'
        2
        * my check thereafter of the content of the macro:
        . di "`e(depvar)'"
         v1 v2 v3 v4 v5 v6 v7 v8
        So, the result macro appears to be unchanged, naturally, but tokenize still is a disappointment for me.
        Any other suggestions or explanation why this is happening?
        http://publicationslist.org/eric.melse

        Comment


        • #5
          While searching further, I got inspiration from a nine year old post on stackoverflow Stata: How to delimit elements of a local macro by which I was able to get code running with the desired result:
          Code:
          forvalues i=1/`e(N_items)' {
            local element `: word `i' of `e(depvar)''
            dis "`element'"
          }
          * which produces:
          
          . forvalues i=1/`e(N_items)' {
            2.   local element `: word `i' of `e(depvar)''
            3.   dis "`element'"
            4. }
          v1
          v2
          v3
          v4
          v5
          v6
          v7
          v8
          So, by using another uirt result macro e(N_items) the loop is controlled to stop at the last used variable/item (v8).
          Now, I have the name (string) of each variable/item available for further use.
          If anyone can think of a more elegant solution, well, I am interested to learn about it.
          http://publicationslist.org/eric.melse

          Comment


          • #6
            Eric, display evaluates what you pass. When you pass v1, it will display the value of the first observation of v1, which is 2 in your data. tokenize works as expected. It's display that confuses you. Type
            Code:
            macro list
            after tokenize to see the contents of local macros.

            Comment


            • #7
              Eric, all you need to do is, in the display command, to enclose the `1' in quotes as I did in #2, so it is parsed as dis "v1" and thus evaluated as a string. Without the quotes, it parses it as dis v1, which leads it to interpret the command as explained in #6.

              Comment


              • #8
                Dear Daniel & Hemanshu,
                I really appreciate your effort and I follow your suggestion now running a replicable example:
                Code:
                webuse masc2, clear
                qui uirt q*
                di "`e(depvar)'"
                tokenize `e(depvar)'
                dis `1'
                macro list
                
                * Which results in:
                . di "`e(depvar)'"
                 q1 q2 q3 q4 q5 q6 q7 q8 q9
                . dis `1'
                1
                
                . macro list
                T_gm_fix_span:  1
                S_level:        95
                F1:             help advice;
                F2:             describe;
                F7:             save
                F8:             use
                S_ADO:          BASE;SITE;.;PERSONAL;PLUS;OLDPLACE
                S_StataMP:      MP
                S_StataSE:      SE
                S_OS:           Windows
                S_OSDTL:        64-bit
                S_MACH:         PC (64-bit x86-64)
                _9:             q9
                _8:             q8
                _7:             q7
                _6:             q6
                _5:             q5
                _4:             q4
                _3:             q3
                _2:             q2
                _1:             q1
                S_FN:           https://www.stata-press.com/data/r19/masc2.dta
                S_FNDATE:        1 Apr 2022 13:07
                So, using dis `1' does not provide q1.
                And, next, I tried dis `_1' but that does not result in anyting.

                Next, I resort to using my back stop solution code:
                Code:
                forvalues i=1/`e(N_items)' {
                  local element `: word `i' of `e(depvar)''
                  dis "`element'"
                }
                
                * And this produces:
                q1
                q2
                q3
                q4
                q5
                q6
                q7
                q8
                q9
                So, I am happy that this does work but certainly I am a bit baffled about the particulars of this issue.
                http://publicationslist.org/eric.melse

                Comment


                • #9
                  You just need
                  Code:
                  dis "`1'"
                  This is what I had suggested in #7 and #2. Sorry if it was not clear. You need to do this for the exact same reason that in your forvalues loop, you use dis "`element'" rather than dis `element'
                  Last edited by Hemanshu Kumar; 18 May 2025, 10:15.

                  Comment


                  • #10
                    Confession: On first reading this I hung back, guessing that uirt was doing something unusual and that it needed scrutiny of the code to find out what that was.

                    That was quite wrong, and indeed Hemanshu Kumar and daniel klein have pointed to the main issue, which is rather what is display doing here?

                    I can go a tiny bit beyond their explanations.

                    After tokenize the local macro 1 contains the variable name q1. So the syntax

                    Code:
                    di `1'
                    evaluates first as

                    Code:
                    di q1
                    and that is interpreted as

                    Code:
                    di q1[1]
                    i.e. the value of q1 in the first observation

                    The developers of Stata decided that asking to display a variable is not an error, but nevertheless you just get shown the value of the variable in the first observation.

                    As you have provided a reproducible example (thanks!) the data can be checked:


                    Code:
                    . webuse masc2, clear
                    (Data from De Boeck & Wilson (2004))
                    
                    . di q1
                    1
                    
                    . di q1[1]
                    1
                    EDIT = #7

                    Comment


                    • #11
                      Dear Daniel & Hemanshu & Nick,

                      Thanks to you all for educating me about the intricacies of using tokenize to collect variable names from a result matrix (in my case from `e(depvar)' after using the command uirt).
                      I suppose my confusion originated from my long term experience of using tokenize to set a series of numbers or text strings for follow up usage.
                      To wrap up this post, my code example that includes all what is discussed above:

                      Code:
                      * Set up
                      ssc install uirt, replace // Stata module to fit unidimensional Item Response Theory models
                      
                      * Example
                      webuse masc2, clear
                      qui uirt q*
                      di "`e(depvar)'"
                      
                      * Set tokens manually (to compare with the coding below)
                      tokenize "q1 q2 q3 q4 q5 q6 q7 q8 q9"
                      forvalues i = 1/9 {
                          dis "`1'"
                          macro shift
                      }
                      
                      * Set tokens by using the stored result macro
                      tokenize `e(depvar)'
                      * Get (display) each variable name
                      * Note that the forvalues range maximum is set manually (i.e. 9)
                      forvalues i = 1/9 {
                          dis "`1'" // get variable name
                          macro shift
                      }
                      
                      * Same as above but now using the uirt model items scalar to set the forvalues range maximum
                      forvalues i=1/`e(N_items)' {
                        local element `: word `i' of `e(depvar)''
                        dis "`element'"
                      }
                      
                      * Get (display) the first value of each variable by using the stored result macro
                      tokenize `e(depvar)'
                      forvalues i = 1/9 {
                          dis `1'
                          macro shift
                      }
                      * Same as above but now using the first case identifier between straight brackets [1]
                      tokenize `e(depvar)'
                      forvalues i = 1/9 {
                          dis `1'[1]
                          macro shift
                      }
                      * Same as above but now using the second case identifier between straight brackets [2]
                      tokenize `e(depvar)'
                      forvalues i = 1/9 {
                          dis `1'[2]
                          macro shift
                      }
                      http://publicationslist.org/eric.melse

                      Comment


                      • #12
                        Seeing macro shift was evocative because before Stata 7 it featured heavily in looping through lists in a macro or a series of macros.

                        I think I remember a post from Alan Riley (now naturally the President of StataCorp) pointing out that it is fairly inefficient and you are better off avoiding it. You would be pushed to notice the inefficiency in a problem of this size.

                        Comment


                        • #13
                          Dear Nick,
                          Maybe my coding is of the more humble type or my projects are indeed too small to notice any inefficiency using macro shift to cycle through elements stored in memory after using tokenize.
                          Note that, instead of the most simple dis "`1'" in the example code below, I usually go through all sorts of manipulations that use the element(s)) provided by using tokenize.
                          But, certainly I am interested to learn what more efficient code then is instead of this code with macro shift:
                          Code:
                          * Set tokens manually
                          tokenize "q1 q2 q3 q4 q5 q6 q7 q8 q9"
                          forvalues i = 1/9 {
                              dis "`1'"
                              macro shift
                          }
                          http://publicationslist.org/eric.melse

                          Comment


                          • #14
                            Actually, I didn't quite understand the reason to do a macro shift here. Couldn't we simply do

                            Code:
                            * Set tokens manually
                            tokenize "q1 q2 q3 q4 q5 q6 q7 q8 q9"
                            forvalues i = 1/9 {
                                dis "``i''"
                            }
                            or am I missing something?

                            Comment


                            • #15
                              Hemanshu Kumar 's code is essentially what I might do here.

                              Another useful approach is

                              Code:
                              local foo q1 q2 q3 q4 q5 q6 q7 q8 q9
                              
                              local wc : word count `foo'
                              
                              forval w = 1/`wc' { 
                                  di "`: word `w' of `foo''"
                              }

                              Comment

                              Working...
                              X