No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • New version of tsegen on SSC

    Thanks to Kit Baum, a new version of tsegen (with Nick Cox) is now available on SSC.

    tsegen is useful for computing descriptive statistics over a rolling window of time using egen row functions. By using a set of time-series variables, tsegen can compute the statistic directly and is orders of magnitude faster than other approaches that require looping over subsets of observations.

    The new version is designed to overcome Stata's limit of 100 unique time-series operators in a command. We became aware of the limit in a recent post by Jean-Marie Meier. See the following post for an example of a rolling average over a window of 1460 days (4 years).

    To update to the new version, type in Stata's command window
    ssc install tsegen, replace

  • #2
    Hi! I have an unbalanced panel. I would like a variable that adds up the last ten years of another variable, by group (though many groups won't have all ten years). Zeros are fine for the missing years. Is there some way to use tsegen for this? Also, is there a way to do this for a whole bunch of consecutive variables at the same time? Thanks!


    • #3
      Is there some way to use tsegen for this?
      Assuming you have annual data (perhaps with gaps) it's
      xtset group year
      tsegen ten_year_total = rowtotal(L(0/9).variable)
      just as it says in the help file.

      You cannot do this for more than one variable at a time, but you can put it in a loop

      foreach v of varlist whatever {
          tsegen `v'_10_yr_total = rowtotal(L(0/9).`v')

      1. Replace whatever by the appropriate combination of variable names and wildcard to cover the variables you want.

      2. The above code will automatically treat missing values as zero.

      3. Also, this code assumes "last ten years" means the current year and the 9 preceding ones. If you mean the ten preceding years, just change L(0/9) to L(1/10) throughout.


      • #4
        Clyde, you are my hero. I have been pulling my hair out over this for two weeks. I had tried two different loop functions, but the dataset is so big that it just ran forever (in fact my work computer is still running it, on its third day). The tsegen formula you gave me above did it almost instantly. You have no idea how grateful I am. -Melissa


        • #5
          I note that the help for tsegen doesn't spell out that it can be repeated in a loop when the same calculation is to be applied to several variables.

          That's a difficult call; many help files could be extended to document how a program could be used in conjunction with others.


          • #6
            Dear all,

            I am trying to use the function as suggested above but am unfortunately running into the following errors. I want to calculate the moving average for a given month, as the average of the 12 previous months (assuming there are at least 6 non-missing values)

            . xtset cusip
            varlist: cusip: string variable not allowed

            . tsegen movingaverage = rowtotal(L(1/10).ret, 6)
            time variable not set

            Would anyone have an idea how to fix this? Have been stuck on this for a while, so would be very grateful for any help!

            Many thanks and best regards,


            • #7
              I should clarify that I want to do this for each cusip, which is an company identifier.

              And of course, my code should be the following: tsegen movingaverage = rowtotal(L(1/12).ret, 6) (with a 12 and not a 10)


              • #8
                Your problem has noting to do with -tsegen- itself. The data must be properly -xtset- or -tsset- before -tsegen- will work with it, as it tells you. You attempted to -xtset cusip-, but it failed because cusip is a string variable. You have to have a numeric variable to use as the group identifier for -xtset-. Also, to use -tsegen-, you must designate a time variable in your -xtset- command so that Stata knows which observations are the lags of any given observation.

                I suggest you create a numeric counterpart to cusip:

                egen n_cusip = group(cusip)
                xtset n_cusip time_var // SUBSTITUTE YOUR TIME VARIABLE FOR time_var
                Now you should be able to use -tsegen-.

                Turning to your -tsegen- command itself, it does not correspond go what you said you want. You said you want a moving average over 12 preceding months, but your (corrected) command would give you a moving sum over 12preceding months. I think what you want is:

                tsegen movingaverage = rowmean(L1/12).ret, 6)


                • #9
                  Clyde is bang on as usual, but extra advice is possible. It's rarely a good idea to keep straight on if you get an error message. Stata's error messages on occasion are cryptic, as when it can't sort out very confused syntax. But here the error messages are both correctly indicating a real problem. If you are stuck, then you need to go back to the help and then to the manual entries (if the help is not detailed enough).


                  • #10
                    Many thanks for your help Clyde, apologies I just saw your message. Anyways, as Nick says, your code works perfectly for me, so thanks again.

                    @Nick: fully understood and completely agree. I do try to go through the help option, but sometimes I guess I must be looking at the problem in the wrong way. In any case, I am making progress daily with Stata, so hope to soon be able to contribute as well to this forum.


                    • #11
                      Not sure if you guys are going to see this here; but since it's related to tsegen I will post it here. If i don't et any response I will post separately.

                      So with tsegen, I am trying to understand what everything in the brackets means. Let's assume I wanna take moving averages of ret variable
                      1. I guess L stands for Lag and F stands for Forward?

                      2.As far as I understand tsegen MA = rowmean(L(0/3).ret) is equivalent of tsegen MA = rowmean(ret L(1/3).ret) and tsegen MA = rowmean(ret.L(1/3).ret) - I just added a dot between ret and L in the last case-

                      3. Then what does this do? : tsegen MA = rowmean(ret L(0/3).ret) - I guess it counts the present value of ret twice?

                      4. My main question : what does the last digit specification exactly do? e.g. tsegen MA = rowmean(L(0/3).ret,3) - My guess, It only calculates the MA if there are 3 or more nonmissing ret values in the L(0/3) ? If that's the case there's something wrong with my results because I don't have any missing values in my ret variable but MA misses values and it looks random to me.. I can't understand why the missing values are generate - The first 4 values for MA I do get because there's no lagged variable. But other than that, there are many missing values in my MA that I cannot explain. Even if I dont use the last specification I would get missing values. but in a different pattern.

                      Thanks in advance


                      • #12
                        Some of your questions are answered in the help files for time-series operators (-help tsvarlist-) and the help for -tsegen- itself (-help tsegen-). I will respond to those that can't.

                        As far as I understand tsegen MA = rowmean(L(0/3).ret) is equivalent of tsegen MA = rowmean(ret L(1/3).ret)
                        Yes that is correct.
                        and tsegen MA = rowmean(ret.L(1/3).ret)
                        No. This is not equivalent to anything; it's a syntax error.

                        3. Then what does this do? : tsegen MA = rowmean(ret L(0/3).ret) - I guess it counts the present value of ret twice?
                        Yes, that's right.

                        As for the problem you are having with missing values, it is not possible to troubleshoot this without seeing the data itself. Use the -dataex- command to post an example of your data that reproduces the problem you describe and perhaps somebody can figure out what is causing the missing values.


                        • #13
                          Thanks as always Clyde.

                          Here's my data sample:

                          * Example generated by -dataex-. To install: ssc install dataex
                          input float(profitability cashflowsd cashflowsd2)
                          -.06926407          .          .
                            .2160494          .          .
                           .08184039  .20174707          .
                            .1727457  .14274009  .14274009
                            .1728894  .12614676  .12614676
                           .16101645  .11396208  .11396208
                            .0856461  .04892841  .04892841
                            .2064098  .04690097  .04690097
                           .20734005  .04475394  .04475394
                            .1880022  .04967097  .04967097
                           .26622045  .05059599  .05059599
                           .22653294 .065708876 .065708876
                           .13651334  .02975651  .02975651
                           .18821023          .          .
                           .19720908          .          .
                            .1834916 .006363141          .
                           .16293547          .          .
                           .08066122          .          .
                           .13409124  .05817667          .
                           .16571806  .04174487  .04174487

                          My code is
                          tsegen cashflowsd = rowsd(L.(1/5).profitability)
                          tsegen cashflowsd2 = rowsd(L.(1/5).profitability, 3)
                          As you can see, there are missing values in the middle of the data.


                          • #14
                            I think I got it Clyde, I was ignoring the xtset command and the data grid. the missing values are for the next entry in my grid.


                            • #15
                              Hi There!

                              I am writing here because my query is related to moving average. I have tried the following command for my data set for which but it returned value of "0"

                              encode ticker, gen(tick)

                              . xtset tick date
                              panel variable: tick (unbalanced)
                              time variable: date, 31jan1975 to 30dec2016, but with gaps
                              delta: 1 day

                              tsegen movage2 = rowtotal(L(1/4).vwretd)

                              date tick vwretd movage2
                              31jan1975 A .1415997 0
                              28feb1975 A .05841149 0
                              31mar1975 A .03019056 0
                              30apr1975 A .04649668 0
                              30may1975 A .05513976 0
                              30jun1975 A .05147306 0
                              31jul1975 A -.06038489 0
                              29aug1975 A -.02355698 0
                              30sep1975 B -.03801003 0
                              31oct1975 B .05585698 0
                              28nov1975 B .03122338 0
                              31dec1975 B -.01099793 0
                              30jan1976 B .1260112 0
                              27feb1976 B .00731444 0
                              31mar1976 B .02679926 0
                              30apr1976 B -.01038881 0
                              28may1976 B -.00936875 0
                              30jun1976 B .04448723 0
                              30jul1976 C -.00615711 0
                              31aug1976 C -.00143513 0
                              30sep1976 C .02448419 0
                              29oct1976 C -.02037277 0
                              30nov1976 C .00536438 0
                              31dec1976 C .06159097 0
                              31jan1977 C -.03633707 0
                              31aug1978 C .04235237 0
                              29sep1978 C -.0068907 0
                              31oct1978 C -.1109973 0
                              30nov1978 D .03379215 0
                              29dec1978 D .01771047 0
                              31jan1979 D .04948888 0
                              28feb1979 D -.02676587 0
                              30mar1979 D .0656175 0
                              30apr1979 D .008522 0
                              31may1979 D -.01357596 0
                              29jun1979 D .04685007 0
                              31jul1979 D .01499138 0
                              31aug1979 D .06472839 0
                              28sep1979 D .00142826 0
                              31oct1979 D -.07265966 0

                              I need to calculate for each tick (company identifier) 4 months moving average (which includes time 0 as well i.e. current year plus last 3 years divided by 4). That means what I am looking for is -
                              date tick vwretd movage2
                              31jan1975 A .1415997 0
                              28feb1975 A .05841149 0
                              31mar1975 A .03019056 0
                              30apr1975 A .04649668 0.0691746075
                              30may1975 A .05513976 0.0475596225
                              30jun1975 A .05147306 0.045825015

                              would really appreciate if some could help me with this