Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Depreciating values to generate "stock-variables"

    Hi,

    my name is Christian, and I am currently trying to depreciate values by years, to generate a "stock-value" out of the "flow-values" meaning values which come up each year.
    This stock-value should be a multiplicated number of about 4 times (exactly: 1/0,23) of the first-appearing flow-value, and after that, be decreased each year by 15%. The flow-value should be added. If a year is missing than the decrease should be as if in this year the flow variable was 0, for the next observation available.

    Even though I tried to look for previous solutions, I couldn't find a applicable solution:
    http://www.stata.com/statalist/archi.../msg00220.html
    http://www.statalist.org/forums/foru...-of-a-variable


    The problem looks like this:

    Code:
    Right now:
      
    symbol year patent_amount (flow values) other variables...
    A 1985 3 ...
    A 1986 3 ...
    A 1987 5 ...
    A 1987 5 ...
    A 1988 6
    B 1990 1
    B 1992 2
    Goal
    symbol ayear patent_amount (flow values) patentstock (stock value) other variables...
    A 1985 3 =3*(1/0,23) ...
    A 1986 3 =patent_stock_last observed*0,85^1+3 ...
    A 1987 5 =patent_stock_last observed*0,85^1+5 ...
    A 1987 5 =patent_stock_last observed*0,85^1+5 ...
    A 1988 6 ...
    B 1990 1 =1*(1/0,23)
    B 1992 2 =patent_stock_last observed*0,85^2+2
    I tried to solve this problem by myself with this code: sort symbol ayear quietly by symbol: gen first_year = _n gen patentstock=0 replace patentstock=patent_amount*(1/0.23) if symbol != symbol[_n-1] & ayear != ayear[_n-1] & first_year ==1 replace patentstock = patentstock[_n-1]*0.85 + patent_amount if symbol == symbol[_n-1] & ayear-1 == ayear[_n-1] drop patent_amount drop first_year However, the code doesnt work if there are several years for the same symbol, and if there is a gap of years. And I cannot figure out how to solve these issues appropriately.

    I would be very happy about any suggestions on how to improve the current code

    Best regards,

    Christian

  • #2
    For a better insight: This is a sample of the file I am working on.
    Attached Files

    Comment


    • #3
      Your problem is incompletely specified, so it is premature to try to write code. When there are multiple observations for a given symbol in the same year, what is supposed to happen? Do we add up the flows for the year, and then apply the 0.85 factor once for the year and add in the total flow? Or do we assume that the different observations represent equally spaced events during the year and we apply a correspondingly increased factor and add in the flow at each observation? Or does some other rule apply? Similarly, if there are year gaps, then what are we to assume about what happened in the intervening years? If you answer these questions, it shouldn't be too difficult to write code.

      Comment


      • #4
        Hey,

        thanks for your reply

        >> When there are multiple observations for a given symbol in the same year, what is supposed to happen?

        Each symbol should have in a year the same stock-value. So for several observations in a year for a symbol the stock-variable should have the same value.

        >>Do we add up the flows for the year, and then apply the 0.85 factor once for the year and add in the total flow?

        The flows are already sums of the entries in the year. (If not, it may be that I made a mistake in the summing-process, I am pretty new to stata, therefore this could happen. I compounded the flows out of the application_ids, which are included in the attached sample). The flow of the year should not be decreased, just the previous stock result. So for year 1 of each symbol the stock should be = (1/0,23)*flow. For the next it should be (0,85)*the value of the previous year stock, plus the flow of the year. For the next years this process should be continued.

        >>Or do we assume that the different observations represent equally spaced events during the year and we apply a correspondingly increased factor and add in the flow at each observation? Or does some other rule apply?

        No, the flows are simplified as sums of the entries in the year, and happen as "event" on a yearly basis.


        >>Similarly, if there are year gaps, then what are we to assume about what happened in the intervening years?

        A year gap should be treated as if there is a flow of zero, thus if the flow in year 1=5 , the flow in year 2=0 and the flow in year3=2 it should be:

        stock year 1= (1/0,23)*5 (=21,74)
        stock year 3= stock year1*0,85*0,85+2 (=17,71)


        Comment


        • #5
          OK. The trick is to first simplify the data set so that we have one and only one observation for each year-symbol combination, and the year gaps are filled in with zeros for patent_amount. Then we can just apply the simple rules, and merge the results back to the original data.

          Code:
          clear*
          use patent_amount
          // VERIFY THAT patent_amount IS CONSTANT WITHIN SYMBOL-AYEAR COMBINATIONS
          by symbol ayear (patent_amount), sort: assert patent_amount[1] == patent_amount[_N]
          
          // COLLAPSE DATA TO ONE OBSERVATION PER SYMBOL-AYEAR
          collapse (first) patent_amount, by(symbol ayear)
          
          //    NOW FILL IN ANY GAPS IN YEARS
          //    SETTING patent_amount TO ZERO IN GAP YEARS
          encode symbol, gen(n_symbol) // CAN'T -xtset- STRING VARIABLE--SEE NOTE
          xtset n_symbol ayear
          gen byte original = 1
          tsfill
          replace patent_amount = 0 if missing(patent_amount)
          replace original = 0 if missing(original)
          
          // AND FILL IN symbol WITH CORRESPONDING VALUE FROM n_symbol
          by n_symbol (symbol), sort: replace symbol = symbol[_N]  // SEE NOTE
          drop n_symbol
          isid symbol ayear patent_amount
          
          //    NOW APPLY THE ROLES FOR THE STOCK AMOUNT
          by symbol (ayear), sort: gen patentstock = patent_amount/0.23 if _n == 1
          by symbol (ayear): replace patentstock = ///
               0.85*patentstock[_n-1] + patent_amount if _n > 1
          
          //    NOW MERGE THESE RESULTS BACK INTO THE ORIGINAL DATA SET
          merge 1:m symbol ayear patent_amount using patent_amount, assert(master match)
          assert (original == 0) == (_merge == 1)
          drop if original == 0
          drop _merge original
          NOTE: The part of the code that fills in the gaps is somewhat complicated because the variable symbol is string rather than numeric. So we have to first create a labeled numeric version of it so that we can -xtset- the data. Then, -tsfill- creates missing values for symbol in the expanded observations, and those have to be backfilled from the original observations with the same numeric version of the symbol variable.

          Comment


          • #6
            Thanks you for sharing! This is really helpful!

            Originally posted by Clyde Schechter View Post
            OK. The trick is to first simplify the data set so that we have one and only one observation for each year-symbol combination, and the year gaps are filled in with zeros for patent_amount. Then we can just apply the simple rules, and merge the results back to the original data.

            Code:
            clear*
            use patent_amount
            // VERIFY THAT patent_amount IS CONSTANT WITHIN SYMBOL-AYEAR COMBINATIONS
            by symbol ayear (patent_amount), sort: assert patent_amount[1] == patent_amount[_N]
            
            // COLLAPSE DATA TO ONE OBSERVATION PER SYMBOL-AYEAR
            collapse (first) patent_amount, by(symbol ayear)
            
            // NOW FILL IN ANY GAPS IN YEARS
            // SETTING patent_amount TO ZERO IN GAP YEARS
            encode symbol, gen(n_symbol) // CAN'T -xtset- STRING VARIABLE--SEE NOTE
            xtset n_symbol ayear
            gen byte original = 1
            tsfill
            replace patent_amount = 0 if missing(patent_amount)
            replace original = 0 if missing(original)
            
            // AND FILL IN symbol WITH CORRESPONDING VALUE FROM n_symbol
            by n_symbol (symbol), sort: replace symbol = symbol[_N] // SEE NOTE
            drop n_symbol
            isid symbol ayear patent_amount
            
            // NOW APPLY THE ROLES FOR THE STOCK AMOUNT
            by symbol (ayear), sort: gen patentstock = patent_amount/0.23 if _n == 1
            by symbol (ayear): replace patentstock = ///
            0.85*patentstock[_n-1] + patent_amount if _n > 1
            
            // NOW MERGE THESE RESULTS BACK INTO THE ORIGINAL DATA SET
            merge 1:m symbol ayear patent_amount using patent_amount, assert(master match)
            assert (original == 0) == (_merge == 1)
            drop if original == 0
            drop _merge original
            NOTE: The part of the code that fills in the gaps is somewhat complicated because the variable symbol is string rather than numeric. So we have to first create a labeled numeric version of it so that we can -xtset- the data. Then, -tsfill- creates missing values for symbol in the expanded observations, and those have to be backfilled from the original observations with the same numeric version of the symbol variable.

            Comment


            • #7
              Thanks for this great thread, I have a very similar problem but to our knowledge more difficult to code in STATA.
              The formula we want to code in STATA 13 on Windows is the following
              Click image for larger version

Name:	Unbenannt.PNG
Views:	1
Size:	7.0 KB
ID:	1479451


              The values for ß1 = 0.1 and ß2= 0.25, while e inidcates an exponential function. The values in brackets behind ß1 and ß2 indicate the value with which the ß-value is to be multiplied.
              The data we want to generate this variable from looks as follows and we want to generate the Patent stock in each Country.

              Code:
              ----------------------- copy starting from the next line -----------------------
              
              
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str224 Country float(year Patents)
              "AT" 1979  4.962845
              "AT" 1980  4.867535
              "AT" 1981  4.820282
              "AT" 1982  4.905275
              "AT" 1983  5.342334
              "AT" 1984  5.153292
              "AT" 1985   5.09375
              "AT" 1986  5.153292
              "AT" 1987  5.521461
              "AT" 1988  5.655992
              "AT" 1989  5.556828
              "AT" 1990  6.013715
              "AT" 1991  5.945421
              "AT" 1992  5.983936
              "AT" 1993  5.978886
              "AT" 1994  5.872118
              "AT" 1995   5.81413
              "AT" 1996  5.953243
              "AT" 1997  5.921578
              "AT" 1998  5.602119
              "AT" 1999  5.802118
              "AT" 2000  6.030685
              "AT" 2001  6.084499
              "AT" 2002  6.061457
              "AT" 2003  5.913503
              "AT" 2004  5.894403
              "AT" 2005  5.929589
              "AT" 2006  6.006353
              "AT" 2007  6.276643
              "AT" 2008  6.410175
              "AT" 2009  6.349139
              "AT" 2010  6.526495
              "AT" 2011  6.249975
              "AT" 2012  6.434546
              "AT" 2013  6.109248
              "AT" 2014  5.749393
              "AT" 2015  2.397895
              "AT" 2016         .
              "BE" 1979  3.610918
              "BE" 1980 4.4308167
              "BE" 1981 4.0430512
              "BE" 1982  3.713572
              "BE" 1983  3.912023
              "BE" 1984  4.406719
              "BE" 1985 4.7004805
              "BE" 1986  4.983607
              "BE" 1987  4.158883
              "BE" 1988  4.204693
              "BE" 1989 4.6634393
              "BE" 1990 4.3820267
              end
              ------------------ copy up to and including the previous line ------------------
              Our main issue is that we do not know how to efficiently code using the time indices and telling STATA to run the sigma sign for each period over the corresponding time values.

              We will keep trying and tell if we found a solution, but would be very appreciative for any hints on how to solve this issue efficiently.

              Thanks so much and best
              Tobi & Daniel

              Comment


              • #8
                The formula you show makes no mathematical sense. The right hand side contains a subscript j which neither appears on the left side, nor is it a bound to a summation index.

                Comment


                • #9
                  Dear Clyde,
                  thanks for the hint, indeed the subscript j does not seem to indicate anything relevant for the coding or summing, so the j can be thought inexistent.

                  Comment


                  • #10
                    Ignoring subscript j, there is the issue of applying an infinite summation to finite data. I will assume that the intent is to truncate the summation at the point where t-s is the earliest available information for country i. Also, it seems to me from a mathematical point of view that missing values of PAT really mess up this calculation--and your data set has some. The code I show will calculate a zero result for those particular observations, but more problematic, those terms are missing from the calculation of later observations for the same country. Basically once you have a missing value of Patents in a given year, the values of the sum for all subsequent years are incorrect (and I don't see any way to get around this.)

                    With that in mind, I think the following is correct, though if you have some previously worked results, you should check them against what I've done here.

                    Code:
                    //    NEED A NUMERIC COUNTRY VARIABLE
                    encode Country, gen(country)
                    xtset country year
                    
                    //    DESIRED SUMMATION IS S = 0 TO INFINITY; BUT FIND MAXIMUM INSTANTIATED
                    //    VALUE OF S IN THE DATA SET
                    by country, sort: gen n_years = _N
                    summ n_years, meanonly
                    local infinity = `r(max)' - 1
                    
                    //    PARAMETERS
                    local b1 = 0.1
                    local b2 = 0.25
                    
                    ///    RUN THE SUMMATION
                    gen double kstock = 0
                    forvalues s = 0/`infinity' {
                        by country (year), sort: replace kstock = kstock ///
                            + exp(-`b1'*`s')*(1-exp(-`b2'*(`s'+1)))*L`s'.Patents ///
                            if !missing(L`s'.Patents)
                    }

                    Comment


                    • #11
                      Dear Clyde,
                      thank you so much for the great solution, I did not already come up with such an efficient solution. Your code worked perfectly.
                      I manually checked correctness for the first periods, without any differences arising:

                      Code:
                      gen kstock_control = 0
                      by country (year): replace kstock_control = exp(-0.1*0)*(1-exp(-0.25*1))*Patents if _n==1
                      by country (year): replace kstock_control = (exp(-0.1*1)*(1-exp(-0.25*2))*Patents[1]) + (exp(-0.1*0)*(1-exp(-0.25*1))*Patents[2]) if _n==2
                      by country (year): replace kstock_control = (exp(-0.1*2)*(1-exp(-0.25*3))*Patents[1]) + (exp(-0.1*1)*(1-exp(-0.25*2))*Patents[2]) + (exp(-0.1*0)*(1-exp(-0.25*1))*Patents[3]) if _n==3
                      by country (year): replace kstock_control = (exp(-0.1*3)*(1-exp(-0.25*4))*Patents[1]) + (exp(-0.1*2)*(1-exp(-0.25*3))*Patents[2]) + (exp(-0.1*1)*(1-exp(-0.25*2))*Patents[3]) + (exp(-0.1*0)*(1-exp(-0.25*1))*Patents[4]) if _n==4
                      I have one question concerning the local infinity: Am I correct that this value is constantly at a certain value (in my dataset with 38 periods, at 37), indicating that in each loop 37 Lags are tried for inclusion in the summing ? I am afraid I haven`t captured something here, as you mentioned to truncate the summation at t=s, e.g. for t=1 as first datapoint only s=0 is relevant.

                      Concerning the concern on the missings I should probably clarify the context. The data above was already logged, while the stock generation will be done on original values. In case of patent counts we do not have a missing data problem as a value of 0 for a given country and year indicates a genuine zero, not a missing value. Thus, it is fine to include this 0 in the given period and all later periods for summation.

                      Thanks so much and best wishes
                      Tobi





                      Comment


                      • #12
                        I have one question concerning the local infinity: Am I correct that this value is constantly at a certain value (in my dataset with 38 periods, at 37), indicating that in each loop 37 Lags are tried for inclusion in the summing ? I am afraid I haven`t captured something here, as you mentioned to truncate the summation at t=s, e.g. for t=1 as first datapoint only s=0 is relevant.
                        You are correct that the local macro infinity has a fixed value (apparently 37 in your data). The actual truncation of the summation at t = s arises from the -if !missing(L`s'.Patents)- clause. If `s' exceeds the number of lags actually available for that particular country, L`s'.Patents will be missing value.

                        The local macro inifnity was just there to set some overall stopping point for the process. It is, in a sense, wasteful because on some of the countries, Stata will attempt to add more terms that don't exist, only to be thwarted by the -if !missing(L`s'.Patents)- clause. There are ways to program around this, but they make the code complicated. If this were a data set with millions of observation, the more complicated code might be worth while in terms of reduced execution time, but knowing that the data was about countries in recent years, I reasoned that the data set was small enough that the "wasted" time wouldn't be a material problem, so I went for the simpler code.

                        Comment

                        Working...
                        X