Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tsvarlist not filling gaps?

    Hi Statalist,

    I have a beginners question.

    I use tsvarlist operators (l. , f. s. , etc.) quite extensively, but I am noticing a worrisome issue; despite the description of tsvarlists, these operators do not seem to handle gaps adequately sometimes.

    I wrote a little program to show the issue:

    Code:
    clear
    
    set obs 10
    
    set seed 567
    // create random integer variable
    generate u1 = runiformint(0, 100)
    gen year = .
    
    // create time variable
    local year 2000
    forvalues n = 1/10{
        replace year = `year' if _n == `n'
        local year = `year' + 1
    }
    
    // create a gap after 2007
    replace year = year[_n+1] if year > 2007
    
    // declare dataset as time series 
    tsset year
    
    // create rolling averages using time-series varlist
    gen u1_ra = l.u1 + u1 + F.u1
    
    // creating rolling averages by manually specifying the observation number 
    sort year
    gen u1_ra_alt = u1[_n-1] + u1[_n] + u1[_n+1]
    which results in:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(u1 year u1_ra u1_ra_alt)
    59 2000   .   .
    69 2001 186 186
    58 2002 189 189
    62 2003 205 205
    85 2004 166 166
    19 2005 118 118
    14 2006 114 114
    81 2007   . 167
    72 2009   . 187
    34    .   .   .
    end
    As you can see, using tsvarlist does not create a variable for the year 2007. Why is that? What am I missing? Do I have to use tsfill after all?

    Thank you very much!

  • #2
    It is you, not Stata, that is handling the gap incorrectly. For year 2007, it is not possible to calculate the rolling average because there is no value of u1 for 2008. Stata appropriately notices this and returns a missing value for that year. Similarly for 2009, there is no rolling average because Stata notices the absence of a value of u1 for 2008.

    Your alternate calculation is incorrect in producing results for those years because it is including the wrong years' data in the calculation. Your "alternate" roling average for 2007 is based on 2006, 2007, and 2009, whereas a true rolling average should be based on 2006, 2007, and 2008. In fact, the use of the time series operators instead of [_n-1] and [_n+1] is provided specifically as a safety feature to prevent you from making that mistake.

    If for your purposes you don't actually want a true rolling average but you just want to average each year's value of u with that of the ones that precede and follow it in the dataset, regardless of what those years happen to be, then your alternate approach does that for you. But if the year values aren't consecutive then it isn't a true rolling average and there is no point to even -tsset-ing the data.*

    -tsfill- will not change anything here. -tsfill- will not magically create a value of u1 for year 2008. It will still be a missing value, and Stata will handle it in the same way. All it will do in this instance is add an extra observation with no real information to your data set.

    *Added: Well, you could generate a new sequential variable, say,
    Code:
    sort year
    gen int seq = _n
    and then -tsset seq-. Then you could use the lag and lead operators with that to get the kind of pseudo-rolling average you are looking for. But this makes sense only if you consider year to be a purely ordinal variable and the irregular time gap between consecutive observations is not relevant to your analysis.
    Last edited by Clyde Schechter; 18 Nov 2021, 12:52.

    Comment


    • #3
      Thanks Clyde, especially the added part helps me out.

      Comment

      Working...
      X