tsvarlist not filling gaps?

Stefan Sliwa

Join Date: Jun 2019
Posts: 19

tsvarlist not filling gaps?

18 Nov 2021, 12:35

Hi Statalist,

I have a beginners question.

I use tsvarlist operators (l. , f. s. , etc.) quite extensively, but I am noticing a worrisome issue; despite the description of tsvarlists, these operators do not seem to handle gaps adequately sometimes.

I wrote a little program to show the issue:

Code:

clear

set obs 10

set seed 567
// create random integer variable
generate u1 = runiformint(0, 100)
gen year = .

// create time variable
local year 2000
forvalues n = 1/10{
    replace year = `year' if _n == `n'
    local year = `year' + 1
}

// create a gap after 2007
replace year = year[_n+1] if year > 2007

// declare dataset as time series 
tsset year

// create rolling averages using time-series varlist
gen u1_ra = l.u1 + u1 + F.u1

// creating rolling averages by manually specifying the observation number 
sort year
gen u1_ra_alt = u1[_n-1] + u1[_n] + u1[_n+1]

which results in:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(u1 year u1_ra u1_ra_alt)
59 2000   .   .
69 2001 186 186
58 2002 189 189
62 2003 205 205
85 2004 166 166
19 2005 118 118
14 2006 114 114
81 2007   . 167
72 2009   . 187
34    .   .   .
end

As you can see, using tsvarlist does not create a variable for the year 2007. Why is that? What am I missing? Do I have to use tsfill after all?

Thank you very much!

Tags: Time Series, tsvarlist gaps

Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

18 Nov 2021, 12:49

It is you, not Stata, that is handling the gap incorrectly. For year 2007, it is not possible to calculate the rolling average because there is no value of u1 for 2008. Stata appropriately notices this and returns a missing value for that year. Similarly for 2009, there is no rolling average because Stata notices the absence of a value of u1 for 2008.

Your alternate calculation is incorrect in producing results for those years because it is including the wrong years' data in the calculation. Your "alternate" roling average for 2007 is based on 2006, 2007, and 2009, whereas a true rolling average should be based on 2006, 2007, and 2008. In fact, the use of the time series operators instead of [_n-1] and [_n+1] is provided specifically as a safety feature to prevent you from making that mistake.

If for your purposes you don't actually want a true rolling average but you just want to average each year's value of u with that of the ones that precede and follow it in the dataset, regardless of what those years happen to be, then your alternate approach does that for you. But if the year values aren't consecutive then it isn't a true rolling average and there is no point to even -tsset-ing the data.*

-tsfill- will not change anything here. -tsfill- will not magically create a value of u1 for year 2008. It will still be a missing value, and Stata will handle it in the same way. All it will do in this instance is add an extra observation with no real information to your data set.

*Added: Well, you could generate a new sequential variable, say,

Code:

sort year gen int seq = _n

and then -tsset seq-. Then you could use the lag and lead operators with that to get the kind of pseudo-rolling average you are looking for. But this makes sense only if you consider year to be a purely ordinal variable and the irregular time gap between consecutive observations is not relevant to your analysis.

Last edited by Clyde Schechter; 18 Nov 2021, 12:52.
1 like
Comment
Stefan Sliwa

Join Date: Jun 2019

Posts: 19
#3

19 Nov 2021, 02:16

Thanks Clyde, especially the added part helps me out.
Comment

Announcement

tsvarlist not filling gaps?

Comment

Comment