Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fill missing values with means using mipolate

    Dear Statalists,

    I would like to fill missing observations in a panel. I found the mipolate command, that seems adequate. The issue now is that I would like to compute the average for missing values between two observed ones, and carry values backwards/forwards if one of those end/start points is missing. The mipolate command seems to do this, but only for the mid point in the interval it takes the mean. It replaces all other values by cascading forwards/backwards.
    So in the extract bellow, flow_ex in row 16. 17. and 18. should be =50.976. Is there a way to adjust the command?

    Thanks for your feedback,
    Yannik


    Code:
    by census: mipolate flow_combined1 year, generate(flow_extra2) nearest
    list year census flow_combined1 flow_extra2 if census==11 
    
                        
            +--------------------------------------+
            | year   census   flow_c~1   flow_ex~2 |
            |--------------------------------------|
        11. | 1979       11          0           0 |
        12. | 1978       11         10          10 |
        13. | 1977       11          0           0 |
        14. | 1976       11         10          10 |
        15. | 1975       11          0           0 |
            |--------------------------------------|
        16. | 1974       11          .           0 |
        17. | 1973       11          .   50.976059 |
        18. | 1972       11          .   101.95212 |
        19. | 1971       11   101.9521   101.95212 |
        20. | 1970       11     70.672   70.671997 |
            +--------------------------------------+



  • #2
    Yannick:
    although technically feasible, your approach is really hard to defend against any decent reviewer (not to say applied statistician).
    See, for more robust advice, -mi- related entry in Stata .pdf manual and the valuable website https://missingdata.lshtm.ac.uk/.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      mipolate is from SSC, as you are asked to explain. You've applied the nearest option. For a gap with three missings (positions 16, 17, 18 in your example) that option implies

      value for 16 is copied from 15 (that's the nearest non-missing value)
      value for 17 is the average of 15 and 19 (equally far away)
      value for 18 is copied from 19 (that's the nearest non-missing value)

      Otherwise, nearest does what I think it should. I would say that, wouldn't I.

      You want to interpolate all missing values with the mean of the non-missing values on either side of the gap. That's a rule, but not one directly supported by mipolate.

      Nevertheless, you can do it:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(year census flow_combined1)
      1979 11        0
      1978 11       10
      1977 11        0
      1976 11       10
      1975 11        0
      1974 11        .
      1973 11        .
      1972 11        .
      1971 11 101.9521
      1970 11   70.672
      end
      
      . mipolate flow_combined1 year, by(census) forward gen(f)
      
      . mipolate flow_combined1 year, by(census) backward gen(b)
      
      . gen wanted = (f + b) / 2
      
      . l
      
           +-------------------------------------------------------------+
           | year   census   flow_c~1           f           b     wanted |
           |-------------------------------------------------------------|
        1. | 1979       11          0           0           0          0 |
        2. | 1978       11         10          10          10         10 |
        3. | 1977       11          0           0           0          0 |
        4. | 1976       11         10          10          10         10 |
        5. | 1975       11          0           0           0          0 |
           |-------------------------------------------------------------|
        6. | 1974       11          .    101.9521           0   50.97605 |
        7. | 1973       11          .    101.9521           0   50.97605 |
        8. | 1972       11          .    101.9521           0   50.97605 |
        9. | 1971       11   101.9521    101.9521    101.9521   101.9521 |
       10. | 1970       11     70.672   70.671997   70.671997     70.672 |
           +-------------------------------------------------------------+
      Note that it needed some engineering to get your data example into the requested form. In future, please use dataex as requested (FAQ Advice #12).

      FWIW, I don't think much of this interpolation rule!





      Comment


      • #4
        Thanks a lot to both of you. This works perfectly. I will definitely give some more thought to the interpolation rule. I was basically trying different things and got hooked on this because I couldn't make stata do what I want.

        Comment

        Working...
        X