Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Manipulating time series data to fit to quarterly data

    Dear all,

    first off, many thanks for the existence of this forum, I have found several solutions to problems I have already encountered when using Stata.
    Unfortunately, the current problem does not fit the above description...

    I'm using Stata 12, with a dataset containing the following information (leaving out unnecessary variables):
    product_id
    client_id
    invoice_id
    revenue
    quantity
    from
    to

    The variables from and to depict dates(e.g. 01jan2011) indicating the period of exploitation of a product (=licenses).
    So an invoice contains details on which product is consumed for which period to which client_id, the number of units sold and the amount invoiced.

    The periods of exploitation are mostly quarterly data (01jan2011 to 31mar2011), and I would like to format the dataset in quarters. This accounts for around 80% of the observations.
    However I face 3 problems regarding data management:
    1. some invoices cover several quarters of exploitation.
    2. some invoices span 1 quarter, but start on another date (e.g. 21jan2011 to 20apr2011).
    3. I need to spread/impute revenue and quantity of observations spanning several quarters over the to be generated quarterly observations.

    This is how I proceeded up to now:
    1. several quarters:
    I just expand these obs, with imputed values to be calculated later on expand to-from+1, gen(expand)
    2. mismatched quarters:
    I have no idea to attach a mismatched quarter to the closest by quarterly observation. Could you help me out with this?
    3. Impute quantity and revenue:
    The easy, but wrong way is just to divide these quantities equally over the number of quarters we observe.
    A better way would be to use information on the evolution of sales over time to impute the values (eg. sales are not spread equally over time, not even linearly declining over time, but even exponentially declining). The questions are: 1) how to use information on the evolution of sales over time to allocate sales across quarters if they span multiple quarters, and 2) which information to use?
    I have been working on some other dataset, where I imputed missing values for observations at period t, using panel regression and multiple imputation, but here, I know the values, they do not have to be imputed, just allocated correctly to the quarters they span.

    The goal is to do panel regressions on the evolution of sales and the effects of the Coase conjecture and cannibalization of varieties.
    Many thanks in advance,
    Glenn

  • #2
    Glen,

    1. The easy one, on the to-from intervals that do not exactly match quarters: convert your HRF (human readable form) dates to SIF (Stata internal form) data. Stata counts days from 01jan1960 onwards, so you will get a numerical value for each of your to and from observations. Use, e.g.:
    Code:
     gen float to_num = date(to, "DMY")
    Then you can easily assign the mismatched dates to the closest quarter or split them to two quarters.

    2. If you have a good idea about the evolution of sales, I would generate a variable containing weights that reflect the trend - you mention declining sales over time (though I am not quite clear what time refers to, perhaps product lifetime?) So let's say you can extract a general trend about the evolution of sales over a three-month interval, then break this information down into weights for each day of this quarter, weights summing to one. Then you can apportion the the days of each mismatched quarter to two, break down the value (sales!?) to days and multiply by the weights, summing the parts that belong to each "real" quarter.

    Not quite sure if this is what you want, but I still hope it helps.

    Martin

    Comment


    • #3
      Hi Martin,

      thanks for your feedback!
      1. I managed to split the quarters using something like:
      *** split duration over quarters
      expand to-from+1, gen(expand)

      bys product_id client_id invoice_id: gen nr_of_quarters= _n
      replace from=from+nr_of_quarters-1
      replace to=from+1
      drop nr_of_quarters

      *** split quantities equally over quarters
      bys product_id client_id invoice_id: egen nobs=count(_n)
      foreach x in revenue quantity {
      replace `x'= `x'/nobs
      }

      2. The data is already in SIF, I just wrote it in HRF for the post. I guess I will need to do it dirty, and write something like
      replace quarter = ... if date >= ... , and this for each quarter to shift..

      3. I have just distributed the sales equally over the time period of the invoice. It's probably not the best approximation, but then I should estimate the (non-linear) evolution of sales on the whole dataset first and apply the weighting to those observations. In fact, it are sales of online music tracks, so you can conjecture that sales peak when they are released, and then taper off. But I should estimate the exact functional form then first... or approximate it with a f(x)=1/x function or so...

      Comment


      • #4
        Hi Glenn,

        I guess it will be some "dirty work", unless some of the data management gurus who regularly supply advice here have something better to say than I have.

        On 3.: Yes, I would try to estimate the evolution of sales (it would help if you had daily data, at least for some representative tracks) and extract a function out of it. Then you could generate a weight for each number of days, front and back - day 1 in quarter 1 (w1), days 2-90 in quarter 2 (1-w1); days 1 and 2 in quarter 1 (w1+w2), days 3-90 in quarter 2 (1-w1-w2) and so on... Since, for each release, you know the number of days which fall into each of the two possible calendar quarters, you can attach the "correct" weight and split the sales figure accordingly.

        You need a variable holding the calendar quarter border days, so that you can calculate the number of days of your sales quarter belonging to each calendar quarter. Maybe I can think of a way to do that a bit more automatically...

        Martin

        Comment


        • #5
          Hi Martin,

          thanks! I will fiddle around with it a little... If I find some compare code, I'll post it here...

          Comment

          Working...
          X