Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace missing values with closest neighbor, panel data

    Dear Statalisters,

    I am a newbie to stata and pretty sure that my problem can be solved quite easily. But I´ve been trying to solve it for hours and I didn`t succeed
    I have panel data that looks like this:

    country_code y1970 y2000 y2016
    a 10 . 20
    b 5 . 10
    c 10 3 4
    For some values, y2000 is missing. I want to replace the missing value with the closest neighbor value, but within one country, i.e., within a row. E.g., for country_code a, y2000 should be 20.
    I tried this:

    foreach y of varlist y1960-y2016 {
    bysort country_code: replace y2000 = `y' if y2000==.
    }


    It works fine, but of course stata takes always the earliest value available. In my example, it would use 10 instead of 20, although 2016 is closer to 2000 than 1970.
    I tried several things, but I just cant figure it out.
    It would be great if you could help me
    Last edited by Lisandra Ilisei; 08 Jun 2017, 03:55.

  • #2
    Lisandra:
    welcome to the list.
    Two remarks about your post:
    - as for most of the statistical analyses carried out with Stata, the best lay-out for panel data regression is the -long- one (whereas your data are seemingly in -wide- format): see -help reshape- for converting from -wide- to -long- format;
    -before deciding to replace missing values with "plausible" values, you shoud first detect whether their missingness is or not informative;
    - there's no compelling need to replace missing values in panel data analysis if one ore more years are missing for a given -id-, as Stata can handle both balanced and unbalanced panel dataset with no problem,
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you a lot, Carlo, for your fast reply!
      I think I should explain my problem a little bit more. Actually, I don´t need that panel data. I am working with a different data set (which is not panel) and I only need the value for the year 2000 for every country out of the panel data (to merge it with my acutal dataset). All other values are irrelevant to me. But I need a value if the one for 2000 is missing. Actually I am not free to decide what to do with missing 2000s, because choosing the closest value to 2000 if 2000 is missing is prescribed.

      Comment


      • #4
        Whether you call this panel data or not, I agree with Carlo. You would be better off with a long layout (structure, format). Also, that would allow you to treat this as an interpolation problem. As far as I know, the most general interpolation command in Stata is mipolate (SSC).

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str1 country_code byte(y1970 y2000 y2016)
        "a" 10 . 20
        "b"  5 . 10
        "c" 10 3  4
        end
        
        reshape long y, i(country_code) j(year)
        
        * install just once with the command below 
        ssc inst mipolate 
        
        mipolate y year, by(country) gen(linear)
        
        list, sepby(country)
        
             +----------------------------------+
             | countr~e   year    y      linear |
             |----------------------------------|
          1. |        a   1970   10          10 |
          2. |        a   2000    .   16.521739 |
          3. |        a   2016   20          20 |
             |----------------------------------|
          4. |        b   1970    5           5 |
          5. |        b   2000    .   8.2608696 |
          6. |        b   2016   10          10 |
             |----------------------------------|
          7. |        c   1970   10          10 |
          8. |        c   2000    3           3 |
          9. |        c   2016    4           4 |
             +----------------------------------+
        I used linear interpolation here, but looking at the help for mipolate or http://www.statalist.org/forums/foru...-interpolation shows that there is a nearest option (which I don't recommend for problems of this kind).

        Comment


        • #5
          Thank you a lot! You really helped me!

          Comment

          Working...
          X