Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling missing values in panel data / single time series

    Hey everyone,

    assume the following MWE from a panel data set:

    Code:
     input id price month year
            123 1 1 2000
            123 . 2 2000
            123 . 3 2000
            123 1 4 2000
            123 . 5 2000
            123 2 6 2000
            123 2 7 2000
            456 . 1 2000
            456 . 2 2000
            456 1.5 3 2000
            456 3 4 2000
            456 . 5 2000
            456 . 6 2000
            456 3 7 2000
            end
    I'd like to fill the missing values for my price variable between two equal occurrences. In this example, for instance, the price in 2000m2 and 2000m3 should be filled with the value 1, as this is the price from 2000m1 reoccurring in 2000m4. Note, however, that looking at ID 456 2000m1 and 2000m2 should stay missing as there is no preceding, non-missing observation for id = 456 before 2000m1 obiviously.

    I'd be really grateful for ideas.

    Thanks!

    Carlo

  • #2
    See mipolate (SSC) for various interpolation methods that include but go beyond that offered by ipolate.

    https://www.statalist.org/forums/for...-interpolation was the original announcement. Search the forum also for later references to mipolate.

    What you want is not offered directly by mipolate, but it is given in cases where forward and backward interpolation give the same result.

    Code:
     clear
     input id price month year
            123 1 1 2000
            123 . 2 2000
            123 . 3 2000
            123 1 4 2000
            123 . 5 2000
            123 2 6 2000
            123 2 7 2000
            456 . 1 2000
            456 . 2 2000
            456 1.5 3 2000
            456 3 4 2000
            456 . 5 2000
            456 . 6 2000
            456 3 7 2000
     end
     
     gen mdate = ym(year, month)
     format mdate %tm
     
     mipolate price mdate, by(id) gen(fprice)
     mipolate price mdate, by(id) gen(bprice) backward
     replace price = fprice if missing(price) & fprice==bprice
     
     l, sepby(id)
    
         +-------------------------------------------------------+
         |  id   price   month   year    mdate   fprice   bprice |
         |-------------------------------------------------------|
      1. | 123       1       1   2000   2000m1        1        1 |
      2. | 123       1       2   2000   2000m2        1        1 |
      3. | 123       1       3   2000   2000m3        1        1 |
      4. | 123       1       4   2000   2000m4        1        1 |
      5. | 123       .       5   2000   2000m5      1.5        2 |
      6. | 123       2       6   2000   2000m6        2        2 |
      7. | 123       2       7   2000   2000m7        2        2 |
         |-------------------------------------------------------|
      8. | 456       .       1   2000   2000m1        .      1.5 |
      9. | 456       .       2   2000   2000m2        .      1.5 |
     10. | 456     1.5       3   2000   2000m3      1.5      1.5 |
     11. | 456       3       4   2000   2000m4        3        3 |
     12. | 456       3       5   2000   2000m5        3        3 |
     13. | 456       3       6   2000   2000m6        3        3 |
     14. | 456       3       7   2000   2000m7        3        3 |
         +-------------------------------------------------------+
    As implied here, you'll need a monthly date variable for most related purposes in Stata.

    Comment


    • #3
      That seems to be indeed the solution to my problem, though I had the hope that there was some more genuine solution without mipolate. In any case many thanks for this great advice Nick!!

      Comment


      • #4
        For an approach from first principles, see https://www.stata.com/support/faqs/d...issing-values/

        Comment


        • #5
          Code:
           clear 
           input id price month year
                  123 1 1 2000
                  123 . 2 2000
                  123 . 3 2000
                  123 1 4 2000
                  123 . 5 2000
                  123 2 6 2000
                  123 2 7 2000
                  456 . 1 2000
                  456 . 2 2000
                  456 1.5 3 2000
                  456 3 4 2000
                  456 . 5 2000
                  456 . 6 2000
                  456 3 7 2000
           end
           
           gen mdate = ym(year, month) 
           format mdate %tm 
           
           gen fprice = price 
           bysort id (mdate) : replace fprice = fprice[_n - 1] if missing(fprice) 
           
           gsort id -mdate 
           gen bprice = price 
           by id: replace bprice = bprice[_n - 1] if missing(bprice)
           replace price = fprice if missing(price) & fprice==bprice 
           
           sort id mdate 
           list , sepby(id)

          Comment


          • #6
            You can also try my new fillmissing program that is available for download from my site.

            Code:
            * To download  
             net install fillmissing, from(http://fintechprofessor.com) replace 
            * Use the above data as example
            
            clear 
             input id price month year
                    123 1 1 2000
                    123 . 2 2000
                    123 . 3 2000
                    123 1 4 2000
                    123 . 5 2000
                    123 2 6 2000
                    123 2 7 2000
                    456 . 1 2000
                    456 . 2 2000
                    456 1.5 3 2000
                    456 3 4 2000
                    456 . 5 2000
                    456 . 6 2000
                    456 3 7 2000
             end
            
            gen mdate = ym(year, month) 
            format mdate %tm 
             
            bys id (mdate): fillmissing price, with(previous)
            To read more about fillmissing and its options, visit this page
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            FinTechProfessor.com
            https://asdocx.com
            Check out my asdoc program, which sends outputs to MS Word.
            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

            Comment


            • #7
              Hello,

              I have unbalanced panel data of individuals with education dummies. Some individuals increased their education level (e.g. graduated from university). In the following example, an individual changed his/her education level from "sec_ed" to "high_ed" in the year 2012. I need to fill in missing values for this individual in 2013 and 2016. In my dataset, there are many individuals with a similar case. Is there any Stata code for replacing missing values for this case? Any suggestions are very appreciated

              clear
              input float(hhid pid year head_hh gender age bmonth byear basic_ed sec_ed high_ed)
              2003 3 2010 0 1 20 1 1990 . 1 .
              2003 3 2011 0 1 21 1 1990 . 1 .
              2003 3 2012 0 1 22 1 1990 . . 1
              2003 3 2013 0 1 23 1 1990 . . .
              2003 3 2016 0 1 26 1 1990 . . .
              end


              Comment


              • #8
                assuming you want the "1" in high_ed of 2012 to be repeated in later years:
                Code:
                bys hhid pid (year): replace high_ed=high_ed[_n-1] if high_ed[_n-1]<. & high_ed==.
                but why do you have separate variables for basic/sec/high_ed? see
                Code:
                help fvvarlist
                added: my code may be more complicated than needed, but you provided so little information that I tried to cover everything that might be needed

                Comment


                • #9
                  See https://www.stata.com/support/faqs/d...issing-values/ for a longer discussion of Rich's method.

                  Comment


                  • #10
                    Originally posted by Rich Goldstein View Post
                    assuming you want the "1" in high_ed of 2012 to be repeated in later years:
                    Code:
                    bys hhid pid (year): replace high_ed=high_ed[_n-1] if high_ed[_n-1]<. & high_ed==.
                    but why do you have separate variables for basic/sec/high_ed? see
                    Code:
                    help fvvarlist
                    added: my code may be more complicated than needed, but you provided so little information that I tried to cover everything that might be needed
                    Dear Rich,
                    Thank you for the code! I used it and it is working. I will also consider using fvvarlist to make variables categorical

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      See https://www.stata.com/support/faqs/d...issing-values/ for a longer discussion of Rich's method.
                      Dear Nick,
                      Thank you for the link! I found it very useful

                      Comment

                      Working...
                      X