Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • To generate a new variable and grammar of args or syntax

    Dear Stata users,
    I want to write a command that can execute -egen seq()- function. Its grammar is -genseq newvarname=from to block- where the "from" "to" and "block" are three numbers that correponding to parameters of egen seq() function. My codes are as bellow, however it fails to execute. Can anyone help me to debug and rewrite it. Thank you very much.

    Code:
    *! Written in 2022 August 18
    
    program define genseq
    
    args newvarname first second third
     gettoken newvarname 0 : 0, parse("= ") bind
     gettoken first 0 : 0, parse("= ")
     gettoken second 0 : 0, parse("= ")
     gettoken third 0 : 0, parse("= ")
    
     if "`first'`second'`third'"=="" {
      display as text "enter first value as number of from(), second value as number of to(), third value as number of block()"
     }
    
     set obs `second'/`third'
     egen `newvarname'=seq(), from(`first') to(`second') block(`third')
    
    end

  • #2
    There are several problems. Starting with the syntax diagram/grammar, you have four "arguments", newvarname, =, from, to, and block. Your args command is written in terms of three arguments. Moreover, the args command does nothing at all for your program. Let's review. When you type

    Code:
    args first second third
    Stata takes the first, second, and third word (separated by spaces) from local 0 (i.e., what the user has typed), and puts them into local macros first, second, and third.

    EDIT
    I am wrong! args has nothing to do with local 0. It applies directly to locals 1, 2, 3, ...
    /EDIT

    That is all. It does not check whether there are three arguments or four or none at all. The command is not well suited for your purposes because

    Code:
    genseq foo=1 2 3
    would put foo=1 into local macro first, 2 into local macro second, and 3 into local macro third, while

    Code:
    genseq foo =  1 2 3
    would put foo into local macro first, = into local macro second, and 1 into local macro third. None of that matters in your case, because your series of gettoken commands, which are generally better suited, overwrite the local macros anyway.

    Moving on to the gettoken series, for both of the examples above would you would end up with foo put into local macro first, = put into local macro second, and 1 put into local macro third. This is not what you want. You probably want something like

    Code:
    gettoken newvarname 0 : 0 , parse("=")
    gettoken equalssign 0 : 0 , parse("=")
    
    args from to block
    Here, I strip everything before the equals sign, i.e., the expected variable name from 0, then I strip the equals sign itself. Local 0 is then expected to contain three remaining arguments from, to, and block, which I assign to the respective local macro names using the args command.

    EDIT
    I am wrong, again! Because args applies directly to locals 1, 2, 3, ..., we would need to reassign those:

    Code:
    tokenized `0'
    args from to block
    /EDIT


    Obviously, you would want to add a couple of checks. Your attempted check

    Code:
    if "`first'`second'`third'"==""
    is probably not functional, because the statement is only true when all three arguments are missing, i.e., when the user has typed

    Code:
    genseq
    Instead, you want something like this:

    Code:
    gettoken newvarname 0 : 0 , parse("=")
    gettoken equalssign 0 : 0 , parse("=")
    
    tokennize `0' // <- added in EDIT
    args from to block void // <- note fourth argument
    
    confirm new variable `newvarname'
    
    if (`"`equalssign'"' != "=") {
        display as err "= required"
        exit 198
    }
    
    if (`"`void'"' != "") {
        display as err `"`void' not allowed"'
        exit 198
    }
    
    local nargs : word count `from' `to' `block'
    if (`nargs' != "") {
        display as err "three arguments are required"
        exit 198
    }
    You could go further. Usually, a new variable may be specified as [type] newvarname, e.g., double foo. Your command cannot handle that yet. Anyway, moving on, the line

    Code:
    set obs `second'/`third'
    is invalid syntax; see

    Code:
    help set obs
    Even with the correct syntax, the command will fail if the specified number is within the number of observations already present. I am not taking the time to spell everything else out here but you probably want to check for a sufficient number of observations before (thinking about) adding new ones. EDIT: You do not want to change the number of observations and have the egen command fail. That would leave you with an altered dataset despite an error, something that should never happen.

    Overall, while this is a nice programming exercise, it might not be worth the effort merely to introduce a new grammar to an already existing command.
    Last edited by daniel klein; 18 Aug 2022, 02:17.

    Comment


    • #3
      Here is a lazy wrapper that leaves error checking almost entirely to egen.

      Code:
      program genseq
          version 17
          
          gettoken before 0 : 0 , parse("=")
          gettoken equals 0 : 0 , parse("=")
          tokenize `0'
          args from to block void
          
          if (`"`equals'"' != "=") {
              display as err "= required"
              exit 198
          }
          
          if (`"`void'"' != "") {
              display as err `"`void' not allowed"'
              exit 198
          }
          
          preserve
          
          if (c(N) < `to') set obs `to'
          
          egen `before' = seq() , from(`from') to(`to') block(`block')
          
          restore , not
      end
      Last edited by daniel klein; 18 Aug 2022, 02:55.

      Comment


      • #4
        See also the seq command introduced in

        1997. Sequences of integers. Stata Technical Bulletin 37: 2-4.
        https://www.stata-press.com/journals...ents/stb37.pdf

        and the seqvar command introduced in

        2008. Speaking Stata: Between tables and graphs. Stata Journal 8: 269--289
        https://www.stata-journal.com/articl...article=gr0034

        Comment


        • #5
          Dear daniel klein thank you so much. I'm tired of typing from-to-block parameters when using egen seq() function, and I cannot understand why Stata use a blank seq() expression without parameter inputs. So I intend to write a neat command that I proposed in this post. And thanks so much for what you presented above. They are so detailed and enlightened that not only help me solve my question in particular but also help me gained in knowledge in general.
          And thank you Nick Cox you always teach us a lot in various ways.

          Comment


          • #6
            StataCorp’s reason for requiring no argument for seq() is the same as mine when I first published the function as community-contributed code around 1999. The function is unusual in needing no variables or expressions as arguments. There is a clear precedent in runiform().

            Not liking the syntax (much) is a different deal. Full confession mode: when I wrote that function and its predecessor seq I had a careful look at functions in S, but what looks good compared with what is conventional and congenial in S does not translate easily to conventional or congenial Stata syntax.

            But no argument (pun intended) from me: different syntaxes are possible and (for example) the syntax of seqvar could be extended to allow blocks.

            Comment


            • #7
              The (current) documentation of egen states that

              Depending on fcn, arguments refers to an expression, varlist, or numlist, [...]
              and the fill() function takes a numlist as arguments. The function might have been added (long) after 1999, and it is the exception. However, in general, there is no (modern) convention that would prohibit a syntax such as

              Code:
              egen [type] newvar = seq(# # #)
              By the way, seq() is also an exceptional case in that it is the only function that does not take any arguments.
              Last edited by daniel klein; 18 Aug 2022, 04:59.

              Comment


              • #8
                daniel klein I agree with #7 and in 2022 would regard that as better syntax. But StataCorp would, I guess, be wary of changing the syntax now.

                Mention of fill() reminds that I had difficulty in understanding that syntax, which was a motive for writing my own command and later my own function.

                Comment


                • #9
                  Thank you daniel and Nick. I am happy to initiate a meaningful discussion that trace back to the early days of Stata's egen functions.

                  Comment


                  • #10
                    In stb37.pdf, just following -seq- (dm44) is the famous -destring- (dm45) command written by Nick Cox and William Gould.

                    Comment

                    Working...
                    X