Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • After xtset: Does Stata automatically do all operations by panelvar(id)?

    Hello,

    I'm getting the feeling that Stata does operations by the id variable after xtset. Just an example:

    Code:
    sysuse xtline1, clear
    xtset person day
    gen test = l.calories
    bys person: gen test2 = l.calories
    here, test and test2 are identical. Is this always the case? Or are there cases where that does't happen? I'm a little afraid of leaving out the bysort at some point and an operation is performed over the whole sample and not by category.

    Thanks for your help,
    Frank


  • #2
    When the data have been -xtset-, any expressions that refer to lag, forward, difference, or seasonal difference time-series operators (-help tsvarlist-) are evaluated using the panel structure defined with the preceding -xtset-. There is no need to use -by- or explicitly sort on panelvar and timevar in order to use these operators in expressions.

    But the commands using those expressions are not executed as if they were preceded by -by panelvar (timevar), sort:- or anything of that nature.

    Be aware also that if you have a command that says -by varlist, sort: do_something_with L.x-, where the varlist sorting would be different than sorting on panelvar timevar, will cause Stata to throw an error message and halt. That is, explicit sort order that conflicts with the -xtset- sort order is not allowed if time series operators are being used in the command.

    Comment


    • #3
      I still don't get xtset intuitively. Consider the following example:

      Code:
      sysuse xtline1, clear
      xtset person day
      gen dummy= person==1 &day==td(16jan2002)
      bys person: gen test=sum(dummy)
      Why would the test variable only start on the 16th of January 2002 for Tess? It seems that in the background, xtset is overwriting my bysort command, no? If I bysort by person, the sum of the dummy should be the same number for all observations of that person in my intuition.
      Last edited by Frank Taumann; 23 Mar 2017, 11:54.

      Comment


      • #4
        No, you misunderstand the sum() function. -sum- is a running sum. When you write -bys person : gen test = sum(dummy)- the result will be 0 until an observation is found where dummy = 1. From there test will be 1, until it encounters another observation where dummy = 1, at which point test will go up to 2 and remain there until finding another observation where dummy = 1, and so on. (The way you've defined dummy, though, it won't actually get higher than 1.) If you want to create a variable that is equal to the total of all values of dummy within person, that's a different command:

        Code:
        by person, sort: egen test = total(dummy)
        Finally, the easiest way to see that -xtset- is not doing anything behind the scenes here is to run the exact same code, leaving out the -xtset- command, and you will get the exact same results.

        Comment


        • #5
          Sum confusion here, and sum is StataCorp's fault, or a side-effect of their naming decisions.

          Back in Stata 8 and some earlier versions, the egen function sum() gave totals and the Stata function sum() gave running totals. The former still works, but is undocumented as of Stata 9: the name is now total(). So, that ambiguity was fixed, or improved upon, without breaking existing code.

          It's worth noting that Mata makes the distinction plain. runningsum() is the name for a running sum function!

          So, why wasn't such a name used originally? Two guesses:

          1. There was a perceived virtue originally (late 1980s/early 1990s) in keeping function names as short as possible.

          2. Cumulative or running sum was seen as the more general idea, particularly as understood in this way

          Code:
          sort somevar
          by somevar: gen mytotal = sum(whatever) 
          by somevar: replace mytotal = mytotal[_N]
          i.e. the machinery of by: and _N made it easy to get group totals as a side effect of getting a running total.

          egen made it even easier, with the side-effect of a lot of extra interpreted code inside egen.


          Comment


          • #6
            Yes, the double check with xtset indeed seems like something someone could come up with... ;-) Thanks a lot for your help and also for the background!

            p.s: i'd like to change the post title to something with "sum" in it but I can't edit it anymore...

            Comment

            Working...
            X