Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using variable name to reflect changing values in subsequent formula

    Dear All,

    I have a general question, though I'm not sure how to phrase it. As such, let me use the following example. Let's say I have panel data, where each panel consists of some static variables and time series ones as well (pretty standard). Furthermore the panel is unbalanced. Now, let's also say that one of these variables, call it 'diff' varies from 13 and 29 between panels but is constant within a given panel. I would like to generate a new variable, call it 'sum' which is a rolling sum of another, time-series variable, call it 'var'. However, and here's the part that I'm curious about, is it possible to set the window of the rolling sum equal to the value of 'diff', which as stated above varies from one panel of the data to another?

    I've seen in a post that someone created a rolling, length-7 sum by first creating a cumulative sum and then creating a new variable equal to that cumulative sum less the 6th lag (sum[_n-6]). Is it possible in Stata to insert a variable name in place of "-x"? So it would, for instance, be sum - sum[_-'diff'] or alternatively sum - L'diff'.sum?

    I'm curious about this for a current problem I have but also in general, as it would be a powerful tool. Thanks in advance!

  • #2
    Code:
    * Generate some example data
    clear
    set seed 4026
    set obs 20
    gen id=_n
    gen diff=runiformint(13,29)
    expand 30
    gen var=uniform()
    bys id: gen time=_n
    * End of data generation
    * Commands to calculate sum
    sort id time
    xtset id time
    * Calculate cumulative sum by id
    by id: gen sum=sum(var)
    * Get ids to loop over them
    levelsof id, loc(ids)
    * Rolling sum to be filled
    gen rsum=.
    * Loop over ids
    foreach i in `ids' {
        * Get value of diff for id, use as window
        qui su diff if id==`i', meanonly
        glo window=r(mean)
        * Calculate the sum for t=diff
        replace rsum=sum if id==`i' & time==$window
        * Calculate the sum for t>diff
        replace rsum=sum-L${window}.sum if id==`i' & time>$window
    }
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      You can also combine Jorge's technique of looping over ids with tsegen (from SSC) to calculate statistics based on a rolling window. To install tsegen, type in Stata's command window

      Code:
      ssc install tsegen
      The following shamelessly borrows Jorge's sample data and calculates the sum, mean, and standard deviation by id over a rolling window that is specific to each id.

      Code:
      * Generate some example data
      clear
      set seed 4026
      set obs 20
      gen id=_n
      gen diff=runiformint(13,29)
      expand 30
      gen var=uniform()
      bys id: gen time=_n
      * End of data generation
      * Commands to calculate sum
      sort id time
      xtset id time
      
      * verify assumptions about the data
      by id: assert diff == diff[1]
      
      * Get ids to loop over them
      levelsof id, loc(ids)
      
      gen rsum  = .
      gen rmean = .
      gen rsd   = .
      foreach i in `ids' {
      
          qui su diff if id==`i', meanonly
          local window = r(mean)
          
          tsegen x = rowtotal(L(1/`window').var, `window') if id == `i'
          replace rsum = x if id == `i'
          drop x
          
          tsegen x = rowmean(L(1/`window').var, `window') if id == `i'
          replace rmean = x if id == `i'
          drop x
          
          tsegen x = rowsd(L(1/`window').var, `window') if id == `i'
          replace rsd = x if id == `i'
          drop x
          
      }

      Comment


      • #4
        Also note that if there are more ids than there are values for diff, you can loop over those instead. This shows both approach

        Code:
        * Generate some example data
        clear
        set seed 4026
        set obs 20
        gen id=_n
        gen diff=runiformint(13,29)
        expand 30
        gen var=uniform()
        bys id: gen time=_n
        * End of data generation
        * Commands to calculate sum
        sort id time
        xtset id time
        
        * verify assumptions about the data
        by id: assert diff == diff[1]
        
        * Get ids to loop over them
        levelsof id, loc(ids)
        
        gen rsum  = .
        gen rmean = .
        gen rsd   = .
        foreach i in `ids' {
        
            qui su diff if id==`i', meanonly
            local window = r(mean)
            
            tsegen x = rowtotal(L(1/`window').var, `window') if id == `i'
            replace rsum = x if id == `i'
            drop x
            
            tsegen x = rowmean(L(1/`window').var, `window') if id == `i'
            replace rmean = x if id == `i'
            drop x
            
            tsegen x = rowsd(L(1/`window').var, `window') if id == `i'
            replace rsd = x if id == `i'
            drop x
            
        }
        
        * Get ids to loop over them
        levelsof diff, loc(diffs)
        
        gen rsum2  = .
        gen rmean2 = .
        gen rsd2   = .
        
        foreach i in `diffs' {
        
            tsegen x = rowtotal(L(1/`i').var, `i') if diff == `i'
            replace rsum2 = x if diff == `i'
            drop x
            
            tsegen x = rowmean(L(1/`i').var, `i') if diff == `i'
            replace rmean2 = x if diff == `i'
            drop x
            
            tsegen x = rowsd(L(1/`i').var, `i') if diff == `i'
            replace rsd2 = x if diff == `i'
            drop x
            
        }

        Comment

        Working...
        X