Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I use scalars as breakpoints in egen's cut function?

    I have time series data and a list of quarters that serve as breakpoints; I want to use the -cut- function of -egen- to create an indicator variable for these periods. For example:

    Code:
    scalar define break1 = tq(1990q1)
    scalar define break2 = tq(2000q3)
    
    sysuse gnp96, clear
    egen period = cut(date), at(0, break1, break2)
    This fails because cut is expecting a numlist, not a scalar, and it seems that Stata doesn't understand scalars like other languages understand variables. Is my only option this (excessively verbose) solution:

    Code:
    scalar define break1 = tq(1990q1)
    scalar define break2 = tq(2000q3)
    
    sysuse gnp96, clear
    egen period = cut(date), at(0, `=break1', `=break2')
    Basically, I want Stata to function like every other major programming language/statistical package (R, Python, MATLAB, C family). If this syntax isn't supported, what exactly is the purpose of scalars?

  • #2
    I have time series data and a list of quarters that serve as breakpoints; I want to use the -cut- function of -egen- to create an indicator variable for these periods. For example:

    Code:
    scalar define break1 = tq(1990q1)
    scalar define break2 = tq(2000q3)
    
    sysuse gnp96, clear
    egen period = cut(date), at(0, break1, break2)
    This fails because cut is expecting a numlist, not a scalar, and it seems that Stata doesn't understand scalars like other languages understand variables. Is my only option this (excessively verbose) solution:

    Code:
    scalar define break1 = tq(1990q1)
    scalar define break2 = tq(2000q3)
    
    sysuse gnp96, clear
    egen period = cut(date), at(0, `=break1', `=break2')
    Basically, I want Stata to function like every other major programming language/statistical package (R, Python, MATLAB, C family). If this syntax isn't supported, what exactly is the purpose of scalars?

    Comment


  • #4
    The purpose of scalars is to hold scalars, naturally enough. The syntax here need not be quite as complicated as you fear. Putting constants into scalars is here just like putting things in bags only to take them out again immediately.

    Code:
      
     egen period = cut(date), at(0, `=tq(1990q1)', `=tq(1990q2)')

    Comment


    • #5
      Originally posted by Nick Cox View Post
      The purpose of scalars is to hold scalars, naturally enough. The syntax here need not be quite as complicated as you fear. Putting constants into scalars is here just like putting things in bags only to take them out again immediately.

      Code:
      egen period = cut(date), at(0, `=tq(1990q1)', `=tq(1990q2)')
      This syntax is equally verbose, however, because the reason I'm using scalars is so I can reuse the value elsewhere. In other words, I'm trying to use Stata's scalars like other languages, *including Mata*, use variables. Many other languages, including Mata, allow passing variables into functions, without needless syntax that evaluates the expression, e.g. `=break1' or `=tq(2000q1)'. This just appears to be something that the designers of Stata chose not to implement in the program itself, although I don't understand why this is inconsistent between Stata and Mata.

      If I have a scalar variable in Mata, I can pass the NAME of that variable into a function, and Mata will understand to look for the value there. Stata, however, doesn't do this, for reasons I can't fathom. I found a Statalist post (http://www.stata.com/statalist/archi.../msg00648.html) that talks about this, but the response that using scalars in this fashion is "indirect" contradicts how many other languages, including Mata, use variables.

      Once again, compare Stata's syntax, and Stata's apparent internal understanding of scalars, to *every other major programming language, including Mata*, and the inconsistency and difference in design choice should become clear. In addition, Stata commands that take a varlist are able to unbox those names and understand that they refer to variables in the current Stata dataset. Why Stata can't do this same checking for commands that take numbers (e.g. if a number is passed in, use it, but if a name is passed in, check if it's a scalar) is beyond me. The purpose of scalars may be to hold scalars, as is tautologically obvious, but the fact that Stata places the burden on the user to manually unbox these seems pointless.

      Note that I use "variables" here to mean variables like those in Mata, not the variables that comprise Stata's datasets.
      Last edited by Michael Anbar; 30 Sep 2014, 12:57.

      Comment


      • #6
        There are long answers to this and short answers. I won't try a long answer.

        Stata's syntax here (which literally is presenting syntax to syntax) is not to evaluate expressions when parsing command syntax. After all, the variables aren't evaluated either at that time.

        The `= ' syntax is a way of subverting that, as such evaluations are carried out first.

        You are right if you want to regard all that as idiosyncratic. Stata started almost 30 years ago and no doubt some things would be different if the developers threw it all away and started again.

        I would use local macros for your purpose if I understand it correctly. (I don't like this egen function at all, but we'll leave that on one side.)

        Code:
        local break1 = tq(1990q1)
        local break2 = tq(2000q3)
        sysuse gnp96, clear
        egen period = cut(date), at(0, `break1', `break2')
        Last edited by Nick Cox; 30 Sep 2014, 13:12.

        Comment


        • #7
          Thank you for the help; I think the local macro syntax is probably the least verbose, in this case. I prefer scalars because it makes -if- statements in, e.g. replace statements, less verbose:

          Code:
          replace x = 2 if date > break1
          for example, but slightly more verbose in the case of this -egen- function and plotting/importing (because some commands like -import haver- require dates in the %tq format, while plotting functions often require the integer itself, which makes life difficult for those of us who want to define a single set of time variables at the start of a file and work with them in a consistent way throughout). Is there an idiomatic way to do this operation in Stata without using the -egen- function? I could use multiple replace/if statements, but the -cut- function seemed the shortest.

          Thank you again,

          Michael Anbar
          Last edited by Michael Anbar; 30 Sep 2014, 13:14.

          Comment


          • #8
            I don't understand your specific question, so I will just expatiate.

            I don't regard verbosity as an absolute evil. I think it's no accident that the most successful computing languages are moderately long-winded.

            I spent a lot of time once with the language J which was, and is, a second version of APL with the wisdom of hindsight being exercised to add extraordinarily, mind-blowingly clever features to what was already a very, very clever language. In J the code

            Code:
            mean =. +/ % #
            defines an entire program, which is wonderfully expressive, so long as you use J essentially all the time. You can't really use J occasionally, except trivially. The designers of J regarded brevity as an absolute virtue, to the extent that in an early version the entire documentation of the built-in text editor was one sentence long, which really was all you needed to know, but (in my case) took me 2 hours to understand, as you had to read it in exactly the right way. The net result is that J is an extremely clever language that almost no-one uses.

            MATLAB is closer to Stata, and extraordinarily good in many ways, but I've not often heard it praised as an exemplar modern programming language. It's gloriously old-fashioned so far as I can see.

            That's not an answer to all your questions at all. I know that "because it was written that way" is not a satisfying answer, indeed not even an answer, but otherwise you will have to hope that Stata's designers say exactly why they made some choices and not others.

            Comment

            Working...
            X