Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing vars interpolation

    How do i interpolate once for all variables. Also, my interpolation generates new variables, giving me an additional task of dropping the old variables. How do i make sure it just fills in the missing variables without having to create new ones? Or is there a way of deleting all old variables at once?

  • #2
    You don't give much description of the interpolation itself, and you don't even pretend to describe your data, let alone show example data. So nobody can give you assuredly correct code. But here's a general scheme that you can mark up to your data set:
    Code:
    foreach v of varlist list_of_variables_to_be_interpolated {
        rename `v' old_`v'
        sort the_x_variable_of_your_interpolation_process
        ipolate old_`v', gen(`v')
    }
    drop old_*
    Replace everything italicized in the above by references appropriate to your actual data.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Also, it is the norm in this community to use our real first and last names as our username. This promotes collegiality and professionalism. Unfortunately, you cannot edit your name on your profile once it has been created. So please click on CONTACT US in the lower right corner of this page, and send a message to the system administrator requesting a change of your username to your real given and surname. Thank you.


    Comment


    • #3
      Thank you Schechter for your response. I have sent a message to admin for name change. I have posted a sample table below from my dataset. Hope I did it the right way. The questions I have are these:
      1. I have variables from v5 to v250. I want to generate missing observations for each of them using linear interpolation. The command I use is: -ipolate v6 year, gen(newvar_6) epolate by( country_name )-. But this only works for one variable so I want to be able to generate missing observations for all variables with one command.
      2. Second issue is that this command generates a new variable (newvar_6) with complete data and i have to delete the old variable. With 250 variables to go, that could be a lot of work. How do I fill-in missing observations for all the variables at once? Also, how do I ensure that fill-in occurs within the existing columns/variables without generating new columns/variables?




      input str51 country_name int year double(v5 v6 v7 v8 v9 v10 v11)
      "Algeria" 1968 . 13744387 . . . . 14.32308799
      "Algeria" 1969 . 14144438 . . . . 15.16711296
      "Algeria" 1970 . 14550034 .415910751 .044023919 .171455865 .001206972 17.21017755
      "Algeria" 1971 . 14960109 .408484345 .032200788 .182954312 .00121803 16.99969357
      "Algeria" 1972 . 15377093 .373681974 .062259355 .15219232 .001560894 16.27246759
      "Algeria" 1973 . 15804428 .505749289 .172652121 .157939182 .006440822 16.51608947
      "Algeria" 1974 . 16247113 17.95169904 17.39715451 .326831673 .093312596 8.99254138
      "Algeria" 1975 . 16709099 14.70384052 13.96016841 .305644373 .281833917 9.691966938
      "Algeria" 1976 . 17190239 16.14181588 15.52537547 .174367169 .344854276 10.66747203
      "Algeria" 1977 . 17690184 16.58337912 16.01410417 .128553348 .30717658 10.37953348
      "Algeria" 1978 . 18212326 14.48423566 13.8689296 .065802368 .412632136 11.75426056
      "Algeria" 1979 . 18760761 31.4118311 30.10536078 .060584593 1.139756252 11.47035574
      "Algeria" 1980 . 19337715 27.12089643 26.20660334 .082746193 .736229282 10.55408935
      "Algeria" 1981 . 19943664 17.26392043 16.56552037 .078170586 .528240676 10.59490047
      "Algeria" 1982 . 20575701 13.53162569 12.77180257 .094925844 .459297107 11.1516422
      "Algeria" 1983 . 21228289 13.17777812 11.55147752 .062509797 1.431469113 11.80587996
      "Algeria" 1984 . 21893853 12.07795155 10.7831309 .057230959 1.115925107 13.82716047
      "Algeria" 1985 . 22565905 11.0994433 9.781130406 .046048409 1.226408002 13.69914188
      "Algeria" 1986 . 23241272 4.519330577 3.538442843 .031758007 .840246832 15.70853213
      "Algeria" 1987 . 23917897 6.32200612 5.570220701 .024724098 .619966758 14.88419564
      "Algeria" 1988 . 24591492 5.646471478 4.773624744 .049807738 .698073688 14.88895829
      "Algeria" 1989 . 25257672 9.40412335 8.339206091 .062896021 .866176873 12.32032899
      "Algeria" 1990 . 25912367 12.42212979 11.16337884 .045336765 1.08729027 11.37795243
      "Algeria" 1991 . 26554329 10.77246056 9.036099281 .050226412 1.507730904 11.44609737
      "Algeria" 1992 . 27181094 8.826766434 7.608035147 .046265866 1.02214052 12.31277714
      "Algeria" 1993 . 27786259 7.959176847 6.695492992 .030534572 1.119268816 13.13449375
      "Algeria" 1994 . 28362253 8.236763397 7.091847476 .02917126 .975045552 12.35938602
      "Algeria" 1995 . 28904298 9.464312175 8.073823518 .024518972 1.150820177 11.49787293
      "Algeria" 1996 . 29411415 12.39737023 10.87145466 .02646166 1.29791116 9.03892537
      "Algeria" 1997 . 29886839 11.29148593 9.358160814 .022214104 1.713115058 8.72045298
      "Algeria" 1998 . 30335732 6.566296914 4.974811813 .051624115 1.269874197 9.909318814
      "Algeria" 1999 . 30765613 9.017068822 7.548896126 .03024501 1.297199592 8.977644334
      "Algeria" 2000 . 31183660 15.30720329 12.89652779 .03891236 2.271801273 7.539511501
      "Algeria" 2001 . 31592153 14.12192924 11.20562605 .033450794 2.765978317 .
      "Algeria" 2002 . 31995046 14.06946845 11.78842653 .029818567 2.127664596 .
      "Algeria" 2003 . 32403514 16.30438692 14.20068109 .032850629 1.930656661 .
      "Algeria" 2004 . 32831096 18.19619724 16.24454436 .039476814 1.803555993 .
      "Algeria" 2005 . 33288437 24.39615783 21.35086755 .075769176 2.882430117 .
      "Algeria" 2006 . 33777915 26.53443332 22.98126193 .105374765 3.351376433 .
      "Algeria" 2007 . 34300076 25.01031844 21.67217864 .171640758 3.087813786 .
      "Algeria" 2008 . 34860715 27.04177096 23.20675508 .469418849 3.255761927 .
      "Algeria" 2009 . 35465760 18.95370072 15.27411995 .119414556 3.425652094 .
      "Algeria" 2010 . 36117637 19.99752275 17.65349925 .186153624 2.020676964 .
      "Algeria" 2011 . 36819558 23.58730344 20.35409525 .1812092 2.924984483 4.836015431
      "Algeria" 2012 . 37565847 22.98711408 18.77064063 .168338029 3.928399397 4.82097761
      "Algeria" 2013 . 38338562 21.32340209 17.36023569 .106189462 3.740573536 5.008916922
      "Algeria" 2014 . 39113313 19.29264485 15.70887184 .067243317 3.337767387 5.240150427
      "Algeria" 2015 . 39871528 11.99024429 9.129884212 .056196238 2.588637683 5.893805499
      "Algeria" 2016 . 40606052 . . . . 6.072532898
      "Algeria" 2017 . . . . . . .
      "Angola" 1968 . 6523791 . . . . .
      "Angola" 1969 . 6642632 . . . . .
      "Angola" 1970 . 6776381 . . . . .
      "Angola" 1971 . 6927269 . . . . .
      "Angola" 1972 . 7094834 . . . . .
      "Angola" 1973 . 7277960 . . . . .
      "Angola" 1974 . 7474338 . . . . .
      "Angola" 1975 . 7682479 . . . . .
      "Angola" 1976 . 7900997 . . . . .
      "Angola" 1977 . 8130988 . . . . .
      "Angola" 1978 . 8376147 . . . . .
      "Angola" 1979 . 8641521 . . . . .
      "Angola" 1980 . 8929900 . . . . .
      "Angola" 1981 . 9244507 . . . . .
      "Angola" 1982 . 9582156 . . . . .
      "Angola" 1983 . 9931562 . . . . .
      "Angola" 1984 . 10277321 . . . . .
      "Angola" 1985 . 10609042 21.27082057 20.48459116 0 .009910452 9.86950016
      "Angola" 1986 . 10921037 11.77362358 10.66944748 0 .009178245 10.38399982
      "Angola" 1987 . 11218268 24.01423497 22.90638292 0 .006470894 8.076000214
      "Angola" 1988 . 11513968 24.19162739 23.03712422 0 .00609954 9.836500168
      "Angola" 1989 . 11827237 25.74162611 24.91082834 0 .005254383 5.688333511
      "Angola" 1990 . 12171441 36.15274846 35.01595907 0 .019803036 5.118000031
      "Angola" 1991 . 12553446 18.22473989 17.29855685 0 .039462113 12.5
      "Angola" 1992 . 12968345 46.12594236 44.08177068 0 .051879064 3.846153846
      "Angola" 1993 . 13403734 42.56069927 40.3245359 0 .068762894 5.576208178
      "Angola" 1994 . 13841301 53.0143638 49.83042595 0 .079311125 4.950016124
      "Angola" 1995 . 14268994 51.01340071 47.66890876 0 .063083245 4.007465964
      "Angola" 1996 . 14682284 47.57884399 45.40119381 0 .049811668 3.445749994
      "Angola" 1997 . 15088981 42.79438497 40.7970414 0 .053443838 4.378391184
      "Angola" 1998 . 15504318 25.83629129 23.39847249 0 .032287485 6.318795907
      "Angola" 1999 . 15949766 49.559859 47.41042296 0 .033205434 3.222875779
      "Angola" 2000 . 16440924 61.95669792 60.45173822 0 .073617599 2.890163321
      "Angola" 2001 . 16983266 45.59173761 44.09783712 0 .088893381 3.873100814
      "Angola" 2002 . 17572649 38.35699397 37.09530272 0 .059715978 .
      "Angola" 2003 . 18203369 36.6540436 35.14547586 0 .056334454 .
      "Angola" 2004 . 18865716 45.19068005 44.31214412 0 .056626898 .
      "Angola" 2005 . 19552542 59.90822153 59.20695001 0 .052891242 .
      "Angola" 2006 . 20262399 54.13524188 53.5855615 0 .055415259 .
      "Angola" 2007 . 20997687 51.32109455 50.83206546 0 .050598299 .
      "Angola" 2008 . 21759420 58.05638921 57.59196922 0 .040577806 .
      "Angola" 2009 . 22549547 28.12215475 27.64902022 0 .033747768 .
      "Angola" 2010 . 23369131 39.03144574 38.61862673 0 .025844979 .
      "Angola" 2011 88.19999695 24218565 44.50494572 44.11706521 0 .036948809 .
      "Angola" 2012 . 25096150 41.55948199 41.15344771 0 .046592586 .
      "Angola" 2013 . 25998340 34.8920999 34.50816694 0 .043465571 .
      "Angola" 2014 . 26920466 27.53249446 27.15919145 0 .033401023 .
      "Angola" 2015 . 27859305 11.25141521 10.73397659 0 .023429827 .
      "Angola" 2016 . 28813463 . . . . .
      "Angola" 2017 . . . . . . .
      end

      Comment


      • #4
        Your desire to do all of this in a single command is not, to my knowledge, attainable. But you don't have to write out a separate command for each variable. This is easily done in a loop

        While it would be possible to write a few lines of code that do the interpolation in situ (i.e. without creating new variables), that is more work than dropping the originals and renaming the new ones to match them.

        This works in your example:
        Code:
        foreach v of varlist v5-v11 {
            rename `v' old_`v'
            ipolate old_`v' year, gen(`v') epolate
            drop old_`v'
        }
        For your full data set, you would replace v11 by v250.

        I cannot resist pointing out that very few things in the real world obey strict linear laws over long periods of time. So the use of linear interpolation in data with very long timespans is questionable, and when being done to a large number of variables it sounds ill-considered. Now, if you have very little missing data, the problem may well be minimal. But your variable v5, in your example, is 99% missing, so interpolation doesn't even produce results for it. v7-10 are about 25% missing, and 511 is nearly 50% missing. Filling in this much missing data by linear interpolation boggles my mind--are you sure this is a sensible approach to your problem? At the very least, you will be using data with downwardly biased variances, which means that if you use these as predictors in regression models, you will get upwardly biased regression coefficients. But more important, unless these variables are quite unusual, I think you are risking turning your data set into a fable

        Comment


        • #5
          A rider to Clyde's excellent advice: A crucial detail is that you must do the interpolations by(country) for them to make any sense.

          I'm quite positive about interpolation when it's the right thing to do, e.g. filling in small gaps when they exist and interpolating from irregular to regular grids when that's needed. But datasets that are largely interpolated are just artefactual. That especially applies to extrapolation, which for the method you're using is entirely dependent on the first two or the last two points in each panel.

          Comment


          • #6
            I should note that Johnny Tad did, in fact, include the -by(country)- option in his original -ipolate- command, and I neglected to included it in mine.

            Comment


            • #7
              Thank you Schechter. What technique would you rather recommend over linear interpolation under this circumstance? Any commands for it?
              Thank you.

              Comment


              • #8
                In my view, the first step in the treatment of missing data is to come to an understanding of how the data that are missing came to be missing in the first place. If you did not personally gather the data yourself (as I gather you did not in this instance) then you have to do careful reading of the documentation that the providers of your data have to understand it.

                When you have that in hand, then you can begin to think about whether the data are missing completely at random, at random, or not at random, and then start to think about appropriate techniques for dealing with it.

                Comment

                Working...
                X