Interpolating between two years of missing data

Saleh sharmah

Join Date: Aug 2022

Posts: 120
#1

Interpolating between two years of missing data

03 Nov 2022, 06:56

I would like to only interpolate v1 missing data in 2014 and 2015 by country . What’s stata conditional command in interpolation? I have done the code: interpolate year v1, gen (wanted) epolate by (county) but it interpolates other missing data in other years , which is not what I wanted; I only want it to interpolate 2014 and 2015 .
thanks in advance for your help
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10207
#2

03 Nov 2022, 07:29

The linear interpolation command is ipolate.

Code:

h ipolate

For additional interpolation methods, see mipolate from SSC. Then an addition to what you did which allows you to achieve what you want is:

Code:

gen realwanted= cond(inrange(year, 2014, 2015), wanted, v1)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35700
#3

03 Nov 2022, 10:01

Let's spell out a tension here. You may want only results for some interval but interpolation may need to use values from outside that interval to do the calculation. I think this is Andrew Musau's point explained differently.

In a simple case you have data for years 2013 2014 2015 2016 and wish to interpolate 2014 and 2015 if a value is missing for both. This is more code than is really needed but it may seem fairly transparent.

Code:

egen value13 = total(cond(year == 2013, value, .)), by(country) egen value16 = total(cond(year == 2016, value, .)), by(country) clonevar wanted = value replace wanted = value13 + (1/3) * (value16 - value13) if missing(wanted) & year == 2014 replace wanted = value13 + (2/3) * (value16 - value13) if missing(wanted) & year == 2015

If values are missing for one year but not both do use another command.
1 like
Comment
Saleh sharmah

Join Date: Aug 2022

Posts: 120
#4

03 Nov 2022, 22:22

Thanks to both of you, Andrew and Nick for your help. Nick’s code is what I’m after and it worked well so thanks Nick, as always for your quick and helpful response.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35700
#5

04 Nov 2022, 05:56

Glad it helped, but there has to be a warning too. It's easy to think that you improved your dataset with interpolation, but the other side is that

1. You don't have more information even if you have fewer missing values. In fact, any boost in degrees of freedom or model fit figures of merit is likely to be spurious.

2. The interpolation is almost always wrong and you don't usually have any kind of handle on how wrong. The real data are usually rougher than any interpolation.

3. Linear interpolation is local and cautious and there should always be scepticism about whether that is best.

You may not believe this but I was once asked whether entering each data point several times over was the solution for a small sample.

That should seem ridiculous to you but interpolation can have that flavour too.

Historically, interpolation's main role was for "reading between the lines" of tables of deterministic functions known to change (very) smoothly.

I am Jekyll and Hyde on this as I've found interpolation an engaging programming problem, but I am sceptical about it being over-used statistically.
Comment

Announcement

Interpolating between two years of missing data

Comment

Comment

Comment

Comment