Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen - by: prefix or by() option

    Up until now I have always the by: prefix in order to egen by groups. I recently came across a code that used the by() option instead. Is there any difference between these? The only difference I see is that using the by() option does not necessitate the sorting of the data. Is there anything else going on "behind the scenes"?

    Code example:
    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . sort price
    
    . by foreign: egen max_rep78 = max(rep78)
    not sorted
    r(5);
    
    . bysort foreign: egen max_rep78_1 = max(rep78)
    
    . sort price
    
    . egen max_rep78_2 = max(rep78), by(foreign)
    
    . compare max_rep*
    
                                            ---------- difference ----------
                                count       minimum      average     maximum
    ------------------------------------------------------------------------
    max_rep~1=max_rep~2            74
                           ----------
    jointly defined                74             0            0           0
                           ----------
    total                          74

  • #2
    The by() option when supported is historic and is what many long-term users have internalised despite its being no longer documented. There is a small difference: you don't have to sort explicitly first.

    You can answer these questions by examining the code from which it will emerge that being byable is a property of egen in general and supporting by options is a property of some egen functions.

    Code:
    viewsource egen.ado
    viewsource _gmax.ado
    But now egen is careful about changing the sort order only temporarily.

    Comment


    • #3
      Nick, can you elaborate on your claim that the by() option is now deprecated? Has Stata issued a guideline? Is anyone following it?

      It seems to me there are still Stata commands that use by() rather than by:, and new user-developed commands that do so as well.

      Comment


      • #4
        P.S. I'm wondering if the by() option is deprecated for all commands, or if your statement was limited to the egen command.

        Comment


        • #5
          Paul: Your comments in #3 and #4 are puzzling.

          I didn't use the word "deprecated" anywhere here that I can see, nor do I see that what I said bears that interpretation. "Historic" means what it says. I don't regard the term as pejorative, here or elsewhere.

          StataCorp just quietly changed what is documented for egen, so that the official help no longer documents by() options for those egen functions it supports. And supporting a by: prefix was added at some point, several versions ago.

          There is nothing in that to break previous, present or future uses of by() options with egen functions, whether official or community-contributed.

          I don't think there was ever a proclamation about good or bad practice here. It's just that introducing by: as a programmable prefix command made explaining egen in this way more nearly consistent with other commands.

          by() options continue all over in Stata. In graphical contexts there is often a distinction between over() which subdivides within a panel and by() which subdivides into panels. I think it would be hard to spell out the distinction in similar terms for all other problems.

          But it's hard, really hard, to maintain consistency of syntax across even official Stata; not breaking syntax that works is a higher priority for StataCorp, or so I infer.

          Comment


          • #6
            Thanks for the clarification. The word "historic" often has a disparaging connotation in reference to software -- but not when you use it!

            Is there any predictability regarding which commands use by: and which use by()?

            Comment


            • #7
              From the output of help by we see

              Code:
                  Stata commands that work with the by prefix indicate this immediately
                  following their syntax diagram by reporting, for example, "by is
                  allowed; see [D] by" or "bootstrap, by, etc., are allowed; see prefix".
              Thus, if we see that indication, we can predict the command in question works with the by: prefix. Similarly, if we see by() among the options allowed for the command, we can safely predict that the command supports the by() option. Where things are murky are commands that support both - like egen - but do not document the by() option.

              Comment

              Working...
              X