Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem specifying conditions with ipolate command

    Hello,

    I have yearly panel data on 4 variables for all U.S. states, and the data are in wide format. There is some missing data, as some states failed to report information for a given variable-year, while other states missed large chunks of years. I am trying to linearly interpolate missing data, but only if a state has data for the year prior to and after the missing year (illustration below). Can someone help we with the code for specifying this IF condition?


    State V1_1980 V1_1981 V1_1982 V1_1983 V1_1984
    AK 1 Miss 7 14 19 <--- linearly interpolate
    AL 10 Miss Miss 40 56 <--- do not linearly interpolate


    Thank you for your help,
    Tom

  • #2
    ipolate will not work rowwise, as its help should imply, which is an example of a more general problem. Your wide layout (structure or format if you will) is not suitable for the majority of panel work in Stata.

    If you reshape, then there is still a problem with your strategy, as ipolate won't by itself obey conditions on lengths of gaps interpolated or not.

    There could be awkward work-arounds for that, but no matter, as averaging previous and following values -- for situations like yours where times are equally spaced -- will work to fill in missings if and only if the gap is of length 1.

    Code:
    clear 
    input str2 State V1_1980 V1_1981 V1_1982 V1_1983 V1_1984
    AK 1 . 7 14 19 
    AL 10 . . 40 56 
    end 
    
    reshape long V1_, i(State) j(Year) 
    rename V1_ V1 
    bysort State (Year) : replace V1 = (V1[_n-1] + V1[_n+1])/2 if missing(V1) 
    
    list, sepby(State) 
    
         +-------------------+
         | State   Year   V1 |
         |-------------------|
      1. |    AK   1980    1 |
      2. |    AK   1981    4 |
      3. |    AK   1982    7 |
      4. |    AK   1983   14 |
      5. |    AK   1984   19 |
         |-------------------|
      6. |    AL   1980   10 |
      7. |    AL   1981    . |
      8. |    AL   1982    . |
      9. |    AL   1983   40 |
     10. |    AL   1984   56 |
         +-------------------+
    Please note the use of dataex to give examples, as requested in FAQ Advice #12.

    Comment


    • #3
      Thank you, Nick!

      Tom

      Comment


      • #4
        Naturally, the mean isn't the only possible interpolant between two known values. The geometric mean, the square root of the product, might make as much or more sense for some variables.

        Comment

        Working...
        X