Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling empty observations in panel

    One of the variables in my panel dataset has only 1 observation every 5 years, which limits my possibilities for analysis.
    Since it is reasonable to assume that this variable changes slowly over time, I would like to fill in the gaps following a function which is basically a straight line between the two values. How can I do this?
    Just to be clear, I have something like this:

    Year _ Variable
    1 ____ 2
    2 _____ .
    3 _____ .
    4 _____ .
    5 _____ 10

    And I want to fill the gaps with a linear function between the points as to have:

    Year _ Variable
    1 ____ 2
    2 ____ 4
    3 ____ 6
    4 ____ 8
    5 _____10

    Of course I have to repeat this for about 9000 observations so I need some sort of loop.

    Any ideas?
    Thanks in advance for your help.

    Giuseppe
    Last edited by gicanzo; 24 Jul 2014, 14:25.

  • #2
    Have you tried the -ipolate- command?

    Comment


    • #3
      Clyde is correct. ipolate does linear interpolation and there is no need for a loop. Other kinds of interpolation in one dimension are supported by commands from SSC: cipolate, csipolate, pchipolate, nnipolate.

      Comment


      • #4
        I am sorry but I do not understand how should I use it. In the help file I read that the syntax of -ipolate- is:

        ipolate yvar xvar [if] [in] , generate(newvar) [epolate]

        Which should be my y and x vars? I only have 1 variable of which i want to extrapolate values

        Comment


        • #5
          In the example you gave, the yvar is Variable and the xvar is Year. When you interpolate/extrapolate linearly, the xvar is the variable that the yvar is made to be a linear function of.

          Comment


          • #6
            That is what I did, but I do not get the results I want.
            My variable is called yos_25, so I wrote:

            > ipolate yos_25 year, generate(yos25)

            however, this generates a new variable that does not keep the values that I already have.
            As you can see in the screenshot, the values that i have on the left, are not maintained on the right.

            Comment


            • #7
              I can't replicate your problem. I created a data set that looks like your photo. You don't show the year variable, but for the purpose at hand, I just let "year" be the observation number. I ran the same command you gave and got:

              Code:
                  yos_25   year   yos25 
                    6.34      1    6.34 
                       .      2    6.46 
                       .      3    6.59 
                       .      4    6.71 
                       .      5    6.84 
                    6.96      6    6.96 
                       .      7    7.05 
                       .      8    7.14 
                       .      9    7.24 
                       .     10    7.33 
                    7.42     11    7.42 
                       .     12    7.43 
                       .     13    7.44 
                       .     14    7.46 
                       .     15    7.47 
                    7.48     16    7.48
              If I list the data using a format that shows more decimal places, there are some slight discrepancies out in the 6th or 7th place that are precision issues, but nothing like the discrepancies you encountered. Are you sure your year variable steps up in equal increments from one observation to the next? If that isn't the problem, I don't know what to tell you.

              Comment


              • #8

                I can't reproduce your problem. It seems that you have data like this (only part of your data is visible on the image: see from the display below how to produce a more readable version).

                Code:
                 
                . l
                
                     +-------------+
                     | year      y |
                     |-------------|
                  1. | 1990   6.34 |
                  2. | 1991      . |
                  3. | 1992      . |
                  4. | 1993      . |
                  5. | 1994      . |
                     |-------------|
                  6. | 1995   6.96 |
                  7. | 1996      . |
                  8. | 1997      . |
                  9. | 1998      . |
                 10. | 1999      . |
                     |-------------|
                 11. | 2000   7.42 |
                 12. | 2001      . |
                 13. | 2002      . |
                 14. | 2003      . |
                 15. | 2004      . |
                     |-------------|
                 16. | 2005   7.84 |
                     +-------------+
                
                . ipolate y year, gen(y2)
                
                . l
                
                     +-------------------------+
                     | year      y          y2 |
                     |-------------------------|
                  1. | 1990   6.34   6.3400002 |
                  2. | 1991      .   6.4640001 |
                  3. | 1992      .   6.5880001 |
                  4. | 1993      .   6.7120001 |
                  5. | 1994      .   6.8360001 |
                     |-------------------------|
                  6. | 1995   6.96        6.96 |
                  7. | 1996      .       7.052 |
                  8. | 1997      .   7.1440001 |
                  9. | 1998      .   7.2360001 |
                 10. | 1999      .   7.3280001 |
                     |-------------------------|
                 11. | 2000   7.42   7.4200001 |
                 12. | 2001      .   7.5040001 |
                 13. | 2002      .   7.5880001 |
                 14. | 2003      .   7.6720001 |
                 15. | 2004      .   7.7560001 |
                     |-------------------------|
                 16. | 2005   7.84   7.8400002 |
                     +-------------------------+
                
                . d
                
                Contains data
                  obs:            16                          
                 vars:             3                          
                 size:           256                          
                ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              storage   display    value
                variable name   type    format     label      variable label
                ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                year            float   %9.0g                 
                y               float   %9.0g                 
                y2              double  %10.0g                
                ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                Sorted by:  
                     Note:  dataset has changed since last saved
                From the example above, note that small differences are possible because ipolate generates a double -- but that is minor. I see no reason for the problem you had; I have to guess that you used some other variable not reported to us.

                Comment


                • #9
                  This is exactly what I do, but the result I get is still the same. This is my original data:

                  Code:
                   +---------------+
                        | year   yos_25 |
                        |---------------|
                     1. | 1970     6.34 |
                     2. | 1971        . |
                     3. | 1972        . |
                     4. | 1973        . |
                     5. | 1974        . |
                        |---------------|
                     6. | 1975     6.96 |
                     7. | 1976        . |
                     8. | 1977        . |
                     9. | 1978        . |
                    10. | 1979        . |
                        |---------------|
                    11. | 1980     7.42 |
                    12. | 1981        . |
                    13. | 1982        . |
                    14. | 1983        . |
                    15. | 1984        . |
                        |---------------|
                    16. | 1985     7.84 |
                    17. | 1986        . |
                    18. | 1987        . |
                    19. | 1988        . |
                    20. | 1989        . |
                        |---------------|
                    21. | 1990     8.26 |
                    22. | 1991        . |
                    23. | 1992        . |
                    24. | 1993        . |
                    25. | 1994        . |
                        |---------------|
                    26. | 1995     8.65 |
                    27. | 1996        . |
                    28. | 1997        . |
                    29. | 1998        . |
                    30. | 1999        . |
                        |---------------|
                    31. | 2000     9.05 |
                    32. | 2001        . |
                    33. | 2002        . |
                    34. | 2003        . |
                    35. | 2004        . |
                        |---------------|
                    36. | 2005     9.46 |
                    37. | 2006        . |
                    38. | 2007        . |
                    39. | 2008        . |
                    40. | 2009        . |
                        |---------------|
                    41. | 2010     9.71 |
                    42. | 2011        . |
                    43. | 2012        . |
                    44. | 2013        . |
                        +---------------+
                  And this is what I get after 'ipolate yos_25 year, gen(yos25)' :

                  Code:
                        +---------------------------+
                        | year   yos_25       yos25 |
                        |---------------------------|
                     1. | 1970     6.34   6.5090545 |
                     2. | 1971        .    6.623685 |
                     3. | 1972        .   6.7383154 |
                     4. | 1973        .   6.8529459 |
                     5. | 1974        .   6.9675763 |
                        |---------------------------|
                     6. | 1975     6.96   7.0822068 |
                     7. | 1976        .   7.1849021 |
                     8. | 1977        .   7.2875974 |
                     9. | 1978        .   7.3902928 |
                    10. | 1979        .   7.4929881 |
                        |---------------------------|
                    11. | 1980     7.42   7.5956834 |
                    12. | 1981        .   7.6955436 |
                    13. | 1982        .   7.7954038 |
                    14. | 1983        .    7.895264 |
                    15. | 1984        .   7.9951242 |
                        |---------------------------|
                    16. | 1985     7.84   8.0949843 |
                    17. | 1986        .   8.1952551 |
                    18. | 1987        .   8.2955258 |
                    19. | 1988        .   8.3957966 |
                    20. | 1989        .   8.4960673 |
                        |---------------------------|
                    21. | 1990     8.26   8.5963381 |
                    22. | 1991        .   8.7246116 |
                    23. | 1992        .    8.852885 |
                    24. | 1993        .   8.9811585 |
                    25. | 1994        .    9.109432 |
                        |---------------------------|
                    26. | 1995     8.65   9.2377055 |
                    27. | 1996        .   9.3396312 |
                    28. | 1997        .    9.441557 |
                    29. | 1998        .   9.5434827 |
                    30. | 1999        .   9.6454084 |
                        |---------------------------|
                    31. | 2000     9.05   9.7473342 |
                    32. | 2001        .   9.8395775 |
                    33. | 2002        .   9.9318209 |
                    34. | 2003        .   10.024064 |
                    35. | 2004        .   10.116308 |
                        |---------------------------|
                    36. | 2005     9.46   10.208551 |
                    37. | 2006        .   10.257245 |
                    38. | 2007        .   10.305938 |
                    39. | 2008        .   10.354632 |
                    40. | 2009        .   10.403326 |
                        |---------------------------|
                    41. | 2010     9.71    10.45202 |
                    42. | 2011        .           . |
                    43. | 2012        .           . |
                    44. | 2013        .           . |
                        +---------------------------+
                  
                  .

                  Comment


                  • #10
                    Okay I solved the problem. Since I am working on panel, that command was calculating a cross-section average or something like that. I solved by calculating it by cross-section unit:

                    by Cid : ipolate yos_25 year, gen(yos25)

                    Problem solved.
                    Thanks a lot to you both!

                    Comment


                    • #11
                      Well, since neither Nick nor I could replicate your problem, and it certainly appears you are doing it correctly, there may be something wrong with your Stata installation.

                      Step 1: Update your Stata. Type -update all- in the command line. Then try it again. If that doesn't resolve it:

                      Step 2: Contact Stata technical support.

                      Comment


                      • #12
                        ipolate in Stata will indeed average across panels unless you instruct it otherwise. Note that ipolate is totally independent of any tsset or xtset, and that is a good idea. Interpolation problems often occur with data with irregular "time" variables, and indeed with quite different variables too.

                        Comment


                        • #13
                          Please how do i ipolate in a panel data. It is only the data in the first country that is maintained. The other countries loose their original datasets

                          Comment


                          • #14
                            To use -ipolate- to interpolate within countries, prefix the command with -by country: -.

                            Comment

                            Working...
                            X