Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpolating missing values based on distance measures between pairs

    Dear Statalists,

    I am asking for advice how, in general, I would want to approach the following problem.

    I have a dyadic data set with country pairs like in the data example below. Var1 and var2 contain information for country1 and country2, respectively. However, for some countries the data is missing. Fortunately, I have a number of standardized distance measures between all countries (which is why the data here is presented in dyad form). Think of geographic distance or distance based on a vector of characteristics that the countries share or do not share.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 pair str2(country1 country2) float(var1 var2 dist1 dist2 dist3)
    "0102" "01" "02" 3 .  .75  -.3   0
    "0103" "01" "03" 3 4   .2   .8   1
    "0104" "01" "04" 3 7   .3   .4  .5
    "0203" "02" "03" . 4 -.23  -.9  .3
    "0204" "02" "04" . 7  -.2  -.2 -.4
    "0304" "03" "04" 4 7   .5 -.12  .5
    end

    In the end, I would like to interpolate the missing values (here missing for country "02") based on the existing data for var1 and var2 for the other countries ("01", "03", "04") weighted by the distance in each country pair ("0102", "0203" etc.). Ideally, the interpolation would make use of all three distance measures, assigning equal weights to each of them.

    Does anyone here have some experience with how to implement such interpolation ideas in pairs along more than one dimension? Are there any prewritten commands that could help me? I understand that mipolate would not be adequate for interpolation along multiple dimensions and/or in pairs. Please correct me, if I am wrong here.

    As always, thank you very much for your insights.

    Best,
    Milan

  • #2
    You'll increase your chances of a useful answer if you follow the FAQ on asking questions - provide Stata code in code delmiters, Stata output, and sample data using dateex. Try to simplify the code and example to what you need to illustrate your problem. You've asked a lot of questions in the last few months - you should know this.

    There are folks on this list who work with geographic data who are better qualified than I am to deal with your question, but they have not chimed in (perhaps because the question is unclear) so let me offer a few suggestions.

    I'm unclear what the three distances measure and what the var's mean. I understand a single distance between A and B, but not three distances.

    The first question is whether this problem has an analytical solution - given a set of dyadic distances, can you fill in a missing distance? That is, are the missing distances identified by the observed distances? If they are, then the procedures for such calculations may exist. If they are not, then you need a penalty function to describe alternative solutions to the problem.

    This has some similarity to multidimensional scaling and similar techniques. You should check the multivariate manual.

    If the sample is not too big, you might do this by iterating over values. MI works fine if your model of interpolation makes sense. I don't know if it does here.

    Comment

    Working...
    X