Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing Interpolation Methods

    Hi,

    I am working on a large panel dataset composed of 30 countries over a 25-years period, and have included several macroeconomic variables in the dataset (e.g.: GDP, Gini, GDP per capita, average education attainment, life expectancy, etc.). My panel is unbalanced as there are several missing values for the various variables over the different countries. Following the advices received in this forum, and following my own research on the subject, I have interpolated these missing values using three alternative methods: ipolate (combined to epolate), pchipolate and cipolate. I now have a balanced panel, which was the whole purpose of these manipulation. However, I would like to know if there are any recommended ways to validate which interpolation methods is best to fit my data. I am aware that there are sometimes underlying theories as to identify and select the best method; however, in this case, I have several missing values for some variable, particularly the middle-class size and there does not seem to be any theory as per how I should interpolate it. The next step to my analysis, to give a bit of a context, is to run panel regressions. Thank you in advance for the help!








  • #2
    The best method to use is whatever reproduces the true values! But naturally you don't have those. Some limited thoughts:

    0. Graphical examination is a good idea.

    1. You can always compare different methods and see if they agree. Looking at cases where methods give very different results may give you indications of which features are or are not being matched by different interpolation methods. Agreement is comforting but no guarantee: for example, a missing value that was really a one-off spike will not be matched by any interpolation method.

    2. You could simulate realistic data, replace some at random with missings and then compare interpolated with "known" values.

    3. You could do #2 with data you have.

    4. With variables such as yours, I would expect better results from transformation, interpolation on transformed scale, and then back-transformation. Transformation could be logarithmic for absolute positive values, logit for Gini, etc.

    5. The weakness and the strength of the method you used is that it pays no attention to the information in other panels.

    6. Datasets with many interpolated values are inevitably different from the originals. For example, the extra degrees of freedom gained are essentially an illusion. I'd expect to have to discuss the merits of the method at length in any report.

    Comment


    • #3
      Always very helpful, thank you a lot Nick. I had some of these points in mind, but number 4 is definitely a good idea and I will follow your advice.

      Comment

      Working...
      X