Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • YTD Sum

    I am trying to create running Year-To-Date sums of expenses for multiple accounts in multiple categories.

    A simplified example of the data that I have:
    Code:
    input long account str6(category month) float year str9 date float expense
    001 "A" "Jan" 2016 "1/1/2016" 100
    001 "A" "Jan" 2016 "1/1/2016" 20
    002 "B" "Jan" 2016 "1/1/2016" 50
    002 "B" "Jan" 2016 "1/1/2016" 80
    001 "A" "Feb" 2016 "2/1/2016" 17
    end
    I would like to calculate the YTD expenditure for each unique set, something like as follows, corresponding to the data above.
    Code:
    ytd
    100
    20
    50
    80
    117
    The issue that I am running into is that I can't get things to sort properly to allow me to do so.
    Code:
    bysort account category date : gen ytd = sum(expense)
    Essentially just recreates the expense variable, as it is breaking things down by date, not creating a cumulative sum.

    Code:
    bysort account category year : gen ytd = sum(expense)
    is closer, but the months are out of order, resulting in the ytd being randomly ordered and not useful.

    I am coming very recently from Python, so my instinct is to just use a "where" function, but can't find a suitable equivalent in Stata.
    Last edited by Dylan Angell; 18 Jan 2017, 10:57.

  • #2
    Please do read and act on http://www.statalist.org/forums/help#stata to give data examples that can actually be run. This is part of what you were asked to read before posting.

    In particular, it's completely ambiguous from your example whether month is

    a string variable

    a numeric variable with value label

    a numeric date variable with a particular date format

    and similar comments apply to the date variable.

    There will be an easy answer if and only if you are precise about your data. There are too many possibilities to document otherwise.

    Conversely I have no idea (sorry) what a where function does in Python.



    Comment


    • #3
      Adding to Nick's excellent advice, there would have been no such ambiguity had you posted your example using the -dataex- command, as everyone is asked to do in the FAQ.

      Comment


      • #4
        As per the FAQ, I was providing a fake example as a result of my dataset being confidential. I apologize that it was not in the particular format that you wanted. It has been changed.

        Comment


        • #5
          Dylan,

          I'm assuming category is the same thing as dept in your example.

          I'm also assuming you want to sort by year, not just date (i.e., you want the YTD, not the running count of all years together).

          If so, try (not tested):
          Code:
          bysort account dept year (date) : gen ytd = sum(expense)

          Comment


          • #6
            Thanks for the example.We don't mind fake examples at all if your real data are confidential. This point is already made in the FAQ.

            month being string is awkward but does not bite: it can be ignored because monthly dates can be derived directly from the daily dates.

            This code needs to be complicated for category, which is not in your data example. Or is dept?


            Code:
            clear
            input long account str6(dept month) float year str9 date float expense
            001 "A" "Jan" 2016 "1/1/2016" 100
            001 "A" "Jan" 2016 "1/1/2016" 20
            002 "B" "Jan" 2016 "1/1/2016" 50
            002 "B" "Jan" 2016 "1/1/2016" 80
            001 "A" "Feb" 2016 "2/1/2016" 17
            end
            
            gen mdate = mofd(daily(date, "MDY"))
            format mdate %tm
            
            bysort account year (mdate) : gen ytd = sum(expense)
            by account year mdate : replace ytd = ytd[_N]
            
            list, sepby(account year mdate)
            
                 +-------------------------------------------------------------------+
                 | account   dept   month   year       date   expense    mdate   ytd |
                 |-------------------------------------------------------------------|
              1. |       1      A     Jan   2016   1/1/2016       100   2016m1   120 |
              2. |       1      A     Jan   2016   1/1/2016        20   2016m1   120 |
                 |-------------------------------------------------------------------|
              3. |       1      A     Feb   2016   2/1/2016        17   2016m2   137 |
                 |-------------------------------------------------------------------|
              4. |       2      B     Jan   2016   1/1/2016        50   2016m1   130 |
              5. |       2      B     Jan   2016   1/1/2016        80   2016m1   130 |
                 +-------------------------------------------------------------------+
            The key here is applying by:

            Comment


            • #7
              Ah yes, category and dept were the same. Thanks for pointing the typo out. It has been updated to category.

              Thanks Roger. I missed the distinction of the parenthetical variables in the help page for by. That has done the trick.

              Comment


              • #8
                Roger's code is on similar lines to mine, but is incomplete given the need to provide the same cumulative totals for each month.

                Comment

                Working...
                X