Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Attaullah: Replying to #13 Naturally missing values complicate the question. Either missing values will show up in products or a user may wish to write code to trap them.

    I am guilty therefore of not discussing all possible problems at once.

    To infer or imply that I deliberately ignore subtleties or show personal animus is unwarranted. However, people must judge for themselves.

    Otherwise many of my comments and questions about your program seem to remain unanswered.
    Last edited by Nick Cox; 22 Apr 2018, 19:43.

    Comment


    • #17
      I've been "off the grid" for a couple of days. Here are my thoughts regarding the original poster's problem, my original proposed, solution, and the context.

      Sofia's original post includes a formula that involves serial multiplication of (1+x)-type terms, with 1 subtracted at the end. She also referred to industrial sectors, Although she did not say as much, because running products don't really come up very often in statistical practice, I assumed (and you know what happens when you assume) that the x's were returns on some investment and that the purpose of the formula was to calculate compound returns.

      If that were correct, then there is no need to worry about zeroes or negatives because the lowest possible return is -1, which corresponds to the entire investment being lost. 1+(-1), of course, is zero, and the running product becomes zero at that point and remains zero forever after. The rolling product ratio then becomes undefined due to division by zero. That is a decent model of reality for investment returns because once your stake is wiped out, it is not meaningful to speak of a return on an investment that has been completely lost.

      Up to that point, Sofia had not shown any data. Looking at here example data, it is quite clear that these numbers cannot possibly be returns on investments because many of them are less than -1. I don't know what these numbers represent, so I cannot rely on the assumption that there will be no negative values of (1+x). More important, if x = -1 in some month, my code will set the running product to zero at that point and forever after. If we then look 13 months ahead and see what my code does for the rolling product, it will give missing value (0/0). But the actual rolling product for the preceding 13 months may well be definable and meaningful. So my code is not suitable for this context, although it would work appropriately if the x's were actually returns.

      Concerning Sofia's receiving an error message about the time variable not being set, again, having not seen data to that point, I assumed that her data were laid out in the Stata-ish long way and that she has panel data with industries and time. So I had suggested -xtset industry month-, which would set the time variable if my assumption were correct. But apparently her data are wide and there is no industry variable, so -xtset industry month- is an error and the time variable never gets set. So Sophia would need to either -reshape- her data to long so that the code would work as posted (and this is probably the best approach in any case because everything else she wants to do from this point on probably will be easier with long data), or she can just -tsset month- to get the time variable set. But since we have established that my code may not be suitable for this data, I'm not sure where this leaves her.

      Concerning missing values in the series, there is no automatic solution to this problem if it arises. There is no one best way to deal with it. Strictly speaking, you cannot calculate the product of 12 things if only 11 of them or fewer exist. Several types of imputation of the missing values are available, and which would work best depends on the meaning of the numbers and the context, about which I have no information (and originally had incorrect assusmptions), so I won't go there.

      All of that said, I will say only briefly here what I have said frequently before on this Forum about logarithmic transformations of data series that include 0 or negative numbers: it can't be done, it shouldn't be done. The 1+ "fix" isn't a fix: it's a mutilation of the data and it should, in my view, never be used. The one thing that looks like, but really isn't, an exception to that rule is when we are calculating a compound return. But in that situation, it is actually the product of the 1+x's, not the x's themselves, that is the goal of the computation, and the 1+x is actually a meaningful entity (the ratio of one value to the preceding value), not just an arbitrary kludge to pretend one has overcome an inherent limitation of logarithms. So the use of log(1+x) is not just permissible but is the only sensible way to use logarithms in that problem. But I would also point out that in that situation it is both computationally more efficient and more transparent to just directly calculate the product along the lines of my code in #2 of this thread. The only circumstance that would guide me to do a running sum of log(1+x) in preference to directly calculating the product is if the time series are very long so that numerical issues in the running product might swamp the calculation. But, in practice, that usually isn't an issue.

      Anyway, let's see if we can figure out how to help Sofia with her problem. In this circumstance, given that her numbers are not returns, and values < -1 can occur, I think the best bet is to actually associated each observation with the 12 preceding observations and then calculate the product. This code assumes that each value of dm occurs at most once. If there are gaps in the dm data (months for which no values are available), they are just ignored and the product of as many values as appear in the 12 month interval (not the last 12 values) are used. Also if a month is represented in the data but one of the N_* variables has a missing value, the rolling product will have missing value for any 12 month period that contains that month. Finally, as in #2, this code assumes that the 12 month rolling window means the current month and the 11 months preceding. If Sofia wants the 12 months preceding (excluding the current month) change -11 0 to -12 -1 in the code belowSo something like this (untested as no -dataex- has yet appeared):

      Code:
      use dataset, clear
      isid dm, sort
      tempfile copy
      save `copy'
      
      rangejoin dm -11  0 using `copy'
      
      foreach v of varlist N_Agric N_food N_Beer {
          by dm (dm_U), sort: gen product_`v' = `v'_U if _n == 1
          by dm (dm_U), sort: replace product_`v' = product_`v'[_n-1]*`v'_U if _n > 1
      }
      by dm (dm_U): keep if _n == 1
      drop *_U
      Some morals of the story:

      1. When asking for help with code, always show example data. Always use -dataex- to do that, and be sure that the data example shown represents the spectrum of actual data adequately.

      2. When responding to a request for help with code, don't make assumptions about the context and the possible values of the data involved unless they are substantiated by example data in the question. Almost always include some -assert- statements that verify that those assumptions are true in the real data.

      3. If the original question does not include example data, respond by requesting example data before writing code for imaginary data based on unverifiable assumptions.

      4. The laws of mathematics cannot be flouted simply because the are sometimes inconvenient. If you are attempted to work with log(delta+x) due to zero or negative values of x, do so only if delta+x, with that specific value of delta, is in its own right a meaningful construct in the context of the problem at hand. If it is just trying to get around the inherent limitations of logarithms, don't do it, ever.






      Comment


      • #18
        For those interested, asrol has been updated to version 4.5.1 where the calculations of the geometric mean and the products have been improved. The Statalist post can be accessed here.
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment

        Working...
        X