Hi everyone,
dataex sample data is included at the end of this post. I am trying to interpolate panel data using this code:
OR
OR
These are the lines of code, I'm not sure which one to use in the end. I'm also not sure which variable to use as my x variable. On the ipolate help page, "interpolation requires that yvar be a function of xvar" and I think year is the closest thing I have to that, but my yvar "v2pesecsch" is just panel data for school enrollment rates. Here is the syntax for ipolate:
THE PROBLEM I'M HAVING:
Regardless of which line I use, the variable that it generates has made-up data even in the non-missing rows (1999-2010) which does NOT match the original data of the original variable for those rows. You can see this in the first handful of rows: ipolate1 and v2pesecsch have different recorded observations for the years 1999 to 2010 (and in my complete dataset, from 1960-2010). Yet ipolate1, ipolate2, and ipolate3 all record the same observations as each other in those rows. So I cannot really trust the interpolated data in the missing-data rows 2011-2020.
When I try to fill in the missing observations and actually interpolate, I get the error: "5,437 contradictions in 5,437 observations" which makes sense.
(I'm not sure what difference the "!" point makes. As far as I understand, "!" means "not", and I WANT to fill in the missing observations, not the "not missing observations". )
When I looked at others' examples using ipolate and mipolate, I noticed that their generated variables have the same observations as their original variable everywhere except the missing sections, as it should be. I have been trying to follow said examples and I am not sure where I am going wrong. I would be much obliged to anyone who could point me in the right direction for how to have my interpolation variable match the original variable in the non-missing rows.
P.S - sorry if you've already seen this, I had to repost it because I had some trouble my previous post and forgot to include some information.
dataex sample data is included at the end of this post. I am trying to interpolate panel data using this code:
Code:
ipolate v2pesecsch year, gen(ipolate1) epolate
Code:
mipolate v2pesecsch year, gen(ipolate2) pchip epolate
Code:
mipolate v2pesecsch year, gen(ipolate3) idw epolate
Code:
ipolate yvar xvar [if] [in] , generate(newvar) [epolate]
Regardless of which line I use, the variable that it generates has made-up data even in the non-missing rows (1999-2010) which does NOT match the original data of the original variable for those rows. You can see this in the first handful of rows: ipolate1 and v2pesecsch have different recorded observations for the years 1999 to 2010 (and in my complete dataset, from 1960-2010). Yet ipolate1, ipolate2, and ipolate3 all record the same observations as each other in those rows. So I cannot really trust the interpolated data in the missing-data rows 2011-2020.
When I try to fill in the missing observations and actually interpolate, I get the error: "5,437 contradictions in 5,437 observations" which makes sense.
Code:
assert v2pesecsch==ipolate1 if !mi(v2pesecsch) 5,437 contradictions in 5,437 observations assert v2pesecsch==ipolate1 if mi(v2pesecsch) 4,741 contradictions in 4,741 observations
When I looked at others' examples using ipolate and mipolate, I noticed that their generated variables have the same observations as their original variable everywhere except the missing sections, as it should be. I have been trying to follow said examples and I am not sure where I am going wrong. I would be much obliged to anyone who could point me in the right direction for how to have my interpolation variable match the original variable in the non-missing rows.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str32 country_name double(year v2pesecsch ipolate1 ipolate2 ipolate3) "Afghanistan" 1999 23.074 61.86512962962961 61.86512962962964 61.865129629629614 "Afghanistan" 2000 23.618 63.61587155963303 63.615871559633064 63.615871559633035 "Afghanistan" 2001 24.271 64.51291666666668 64.51291666666665 64.5129166666667 "Afghanistan" 2002 24.924 65.55614814814813 65.55614814814814 65.55614814814817 "Afghanistan" 2003 25.576 66.59934259259259 66.59934259259263 66.59934259259259 "Afghanistan" 2004 26.229 67.6425277777778 67.64252777777783 67.64252777777781 "Afghanistan" 2005 26.882 68.92466972477065 68.92466972477061 68.92466972477061 "Afghanistan" 2006 28.21 69.67728703703705 69.67728703703703 69.67728703703705 "Afghanistan" 2007 29.538 70.66885185185188 70.66885185185181 70.66885185185184 "Afghanistan" 2008 30.866 71.66034259259256 71.6603425925926 71.66034259259258 "Afghanistan" 2009 32.194 72.65192592592594 72.65192592592592 72.65192592592592 "Afghanistan" 2010 33.522 73.85347706422019 73.85347706422021 73.8534770642202 "Afghanistan" 2011 . 75.05502820251422 75.24489396645133 71.9520302301615 "Afghanistan" 2012 . 76.25657934080846 76.76587050954683 69.9676005772284 "Afghanistan" 2013 . 77.45813047910269 78.35610057043426 68.40773784965263 "Afghanistan" 2014 . 78.65968161739693 79.95527802604123 67.1291844948158 "Afghanistan" 2015 . 79.86123275569116 81.50309675329525 66.04587261740855 "Afghanistan" 2016 . 81.0627838939854 82.9392506291239 65.10682636237689 "Afghanistan" 2017 . 82.26433503227963 84.20343353045476 64.27936957990778 "Afghanistan" 2018 . 83.46588617057387 85.23533933421537 63.541141519250935 "Afghanistan" 2019 . 84.6674373088681 85.9746619173333 62.87606305245505 "Afghanistan" 2020 . 85.86898844716234 86.36109515673611 62.2721250292932 end

Now I'm just experimenting with different options of ipolate and mipolate. Thank you both so much for your help so far.
Comment