Interpolation of unbalanced panel data

Eva Skrebes

Join Date: Dec 2018

Posts: 3
#1

Interpolation of unbalanced panel data

17 Dec 2018, 10:23

Dear all,

I am currently working with unbalanced panel data which I need to prepare for regressions. Since only yearly data is available, I have to interpolate the values for some variables, such as loans, to get quaterly data. The observations are aggregated to groups with a unique group-ID (newid) and the data is formated as a time series (tsset).

Since I have to deal with unbalanced panel data, some years in the time series are missing. I worry about missing years somewhere in the middle of the sequence, e.g. if 2004 and 2006 are available, but 2005 not. Therefore, my questions are:

1. After using tsspell, I have seen that some groups have several spells (up to 3). Is it possible to incorporate this information in the ipolate command, e.g. by creating new subgroup IDs? I thought that they could refer to one sequence before a spell occurs. How should I implement it technically?

2. Is there a common approach in such a case?

So far, I have been using:

by newid: ipolate loans dateq, generate (loans2)

Data example with one (three) spells within different groups:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(newid dateq) byte(_spell _seq _end) float loans2 395 176 1 1 0 . 395 177 1 2 0 . 395 178 1 3 0 . 395 179 1 4 0 15576 395 180 1 5 0 . . . . 396 204 3 1 0 . 396 205 3 2 0 . 396 206 3 3 0 . 396 207 3 4 0 605 396 208 3 5 0 . 396 209 3 6 0 . 396 210 3 7 0 . 396 211 3 8 0 511 . end format %tq dateq

I am using Stata 15.0

Thank you a lot in advance!
Eva
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36053
#2

18 Dec 2018, 02:24

Your data example is not what dataex would ever produce. Some observations have 6 values, some have 1, one has 7. The whole point about dataex is to produce code that will run immediately and without error after copying and pasting into Stata.. If you mess up the output, you mess up someone else's input.

That said, the flavour of your problem seems fairly clear, although the problem is missing values, not lack of balance.

As you seem to want to map yearly values into quarterly values, various possibilities arise such as

1. assigning each quarter a constant for the year

2. Denton's method

Code:

ssc desc denton

3. interpolation of traditional kind.

On the last, I don't see that tsspell (SSC, as you are asked to explain) is of any use or relevance, but you have not explained how you used it to define spells.

Interpolation works best, to spell out the obvious, with very short spells of missing values in otherwise smoothly changing series. When the missings are 3/4 of the data points, it's a wild guessing game. Linear interpolation just happens to be the simplest method, and not necessarily the best. If I had an outcome that was necessarily positive or zero or positive I would consider interpolation on a transformed scale followed by back-transformation.
Comment
Eva Skrebes

Join Date: Dec 2018

Posts: 3
#3

18 Dec 2018, 08:17

Hello Nick,

thank you very much for you reply.

I am sorry for the example, I adjusted the dataex output because I thought it does not explain my issue very well. Next time I will post the original.

Indeed, the difficulty is that I have missing observations for

1) quarters which I want to interpolate and
2) whole years which I am not sure yet how to treat.

So, while one entity might have observations for each year between 2004-2014, others might have only observations for 2007-2008 (I assumed that it is what is meant by unbalanced data). Denton's method seems to be a good way to deal with the first issue. I will also definitely consider your last advice.

For the second issue, I used " tspell newid" (my group id). As a result, the _spell variable changes from "1" to "2", or to "3", whenever there is a time gap and a new time sequence within the same group id starts. Also, the length of the sequence ranges from 4 to 44 on a quaterly basis which seems to be the larger issue.

Here is another example for the newid 2927 which has several time gaps:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(newid dateq) double roaa byte(_spell _seq _end) 2927 180 . 1 1 0 2927 181 . 1 2 0 2927 182 . 1 3 0 2927 183 .0064 1 4 0 2927 184 . 1 5 0 2927 185 . 1 6 0 2927 186 . 1 7 0 2927 187 .0083 1 8 0 2927 188 . 1 9 0 2927 189 . 1 10 0 2927 190 . 1 11 0 2927 191 .0089 1 12 1 2927 196 . 2 1 0 2927 197 . 2 2 0 2927 198 . 2 3 0 2927 199 .0051 2 4 1 2927 204 . 3 1 0 2927 205 . 3 2 0 2927 206 . 3 3 0 2927 207 .0077 3 4 1 end format %tq dateq

Does it add any new information or should I stick to your previous recommendations?

Again, thank you a lot in advance.

Eva
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#4

18 Dec 2018, 08:51

Thanks for the extra detail, but I don't have anything to add (or subtract). I think your next questions are for economists, how to think about roaa (whatever that is).
Comment
Eva Skrebes

Join Date: Dec 2018

Posts: 3
#5

18 Dec 2018, 11:11

Ok, thank you, it helps.
Comment

Announcement

Interpolation of unbalanced panel data

Comment

Comment

Comment

Comment