Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpolating between two years of missing data

    I would like to only interpolate v1 missing data in 2014 and 2015 by country . What’s stata conditional command in interpolation? I have done the code: interpolate year v1, gen (wanted) epolate by (county) but it interpolates other missing data in other years , which is not what I wanted; I only want it to interpolate 2014 and 2015 .
    thanks in advance for your help

  • #2
    The linear interpolation command is ipolate.

    Code:
    h ipolate
    For additional interpolation methods, see mipolate from SSC. Then an addition to what you did which allows you to achieve what you want is:

    Code:
    gen realwanted= cond(inrange(year, 2014, 2015), wanted, v1)

    Comment


    • #3
      Let's spell out a tension here. You may want only results for some interval but interpolation may need to use values from outside that interval to do the calculation. I think this is Andrew Musau's point explained differently.

      In a simple case you have data for years 2013 2014 2015 2016 and wish to interpolate 2014 and 2015 if a value is missing for both. This is more code than is really needed but it may seem fairly transparent.

      Code:
      egen value13 = total(cond(year == 2013, value, .)), by(country)
      egen value16 = total(cond(year == 2016, value, .)), by(country)
      
      clonevar wanted = value
      
      replace wanted = value13 + (1/3) * (value16 - value13) if missing(wanted) & year == 2014
      replace wanted = value13 + (2/3) * (value16 - value13) if missing(wanted) & year == 2015
      If values are missing for one year but not both do use another command.

      Comment


      • #4
        Thanks to both of you, Andrew and Nick for your help. Nick’s code is what I’m after and it worked well so thanks Nick, as always for your quick and helpful response.

        Comment


        • #5
          Glad it helped, but there has to be a warning too. It's easy to think that you improved your dataset with interpolation, but the other side is that

          1. You don't have more information even if you have fewer missing values. In fact, any boost in degrees of freedom or model fit figures of merit is likely to be spurious.

          2. The interpolation is almost always wrong and you don't usually have any kind of handle on how wrong. The real data are usually rougher than any interpolation.

          3. Linear interpolation is local and cautious and there should always be scepticism about whether that is best.

          You may not believe this but I was once asked whether entering each data point several times over was the solution for a small sample.

          That should seem ridiculous to you but interpolation can have that flavour too.

          Historically, interpolation's main role was for "reading between the lines" of tables of deterministic functions known to change (very) smoothly.

          I am Jekyll and Hyde on this as I've found interpolation an engaging programming problem, but I am sceptical about it being over-used statistically.

          Comment

          Working...
          X