Filling missing values in panel data / single time series

Carlo Parafioriti

Join Date: Jun 2017

Posts: 14
#1

Filling missing values in panel data / single time series

31 Dec 2017, 04:17

Hey everyone,

assume the following MWE from a panel data set:

Code:

input id price month year 123 1 1 2000 123 . 2 2000 123 . 3 2000 123 1 4 2000 123 . 5 2000 123 2 6 2000 123 2 7 2000 456 . 1 2000 456 . 2 2000 456 1.5 3 2000 456 3 4 2000 456 . 5 2000 456 . 6 2000 456 3 7 2000 end

I'd like to fill the missing values for my price variable between two equal occurrences. In this example, for instance, the price in 2000m2 and 2000m3 should be filled with the value 1, as this is the price from 2000m1 reoccurring in 2000m4. Note, however, that looking at ID 456 2000m1 and 2000m2 should stay missing as there is no preceding, non-missing observation for id = 456 before 2000m1 obiviously.

I'd be really grateful for ideas.

Thanks!

Carlo
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35468

31 Dec 2017, 05:16

See mipolate (SSC) for various interpolation methods that include but go beyond that offered by ipolate.

https://www.statalist.org/forums/for...-interpolation was the original announcement. Search the forum also for later references to mipolate.

What you want is not offered directly by mipolate, but it is given in cases where forward and backward interpolation give the same result.

Code:

 clear
 input id price month year
        123 1 1 2000
        123 . 2 2000
        123 . 3 2000
        123 1 4 2000
        123 . 5 2000
        123 2 6 2000
        123 2 7 2000
        456 . 1 2000
        456 . 2 2000
        456 1.5 3 2000
        456 3 4 2000
        456 . 5 2000
        456 . 6 2000
        456 3 7 2000
 end
 
 gen mdate = ym(year, month)
 format mdate %tm
 
 mipolate price mdate, by(id) gen(fprice)
 mipolate price mdate, by(id) gen(bprice) backward
 replace price = fprice if missing(price) & fprice==bprice
 
 l, sepby(id)

     +-------------------------------------------------------+
     |  id   price   month   year    mdate   fprice   bprice |
     |-------------------------------------------------------|
  1. | 123       1       1   2000   2000m1        1        1 |
  2. | 123       1       2   2000   2000m2        1        1 |
  3. | 123       1       3   2000   2000m3        1        1 |
  4. | 123       1       4   2000   2000m4        1        1 |
  5. | 123       .       5   2000   2000m5      1.5        2 |
  6. | 123       2       6   2000   2000m6        2        2 |
  7. | 123       2       7   2000   2000m7        2        2 |
     |-------------------------------------------------------|
  8. | 456       .       1   2000   2000m1        .      1.5 |
  9. | 456       .       2   2000   2000m2        .      1.5 |
 10. | 456     1.5       3   2000   2000m3      1.5      1.5 |
 11. | 456       3       4   2000   2000m4        3        3 |
 12. | 456       3       5   2000   2000m5        3        3 |
 13. | 456       3       6   2000   2000m6        3        3 |
 14. | 456       3       7   2000   2000m7        3        3 |
     +-------------------------------------------------------+

As implied here, you'll need a monthly date variable for most related purposes in Stata.

Comment

Carlo Parafioriti

Join Date: Jun 2017

Posts: 14
#3

31 Dec 2017, 07:49

That seems to be indeed the solution to my problem, though I had the hope that there was some more genuine solution without mipolate. In any case many thanks for this great advice Nick!!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35468
#4

31 Dec 2017, 10:32

For an approach from first principles, see https://www.stata.com/support/faqs/d...issing-values/
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35468

01 Jan 2018, 01:28

Code:

 clear 
 input id price month year
        123 1 1 2000
        123 . 2 2000
        123 . 3 2000
        123 1 4 2000
        123 . 5 2000
        123 2 6 2000
        123 2 7 2000
        456 . 1 2000
        456 . 2 2000
        456 1.5 3 2000
        456 3 4 2000
        456 . 5 2000
        456 . 6 2000
        456 3 7 2000
 end
 
 gen mdate = ym(year, month) 
 format mdate %tm 
 
 gen fprice = price 
 bysort id (mdate) : replace fprice = fprice[_n - 1] if missing(fprice) 
 
 gsort id -mdate 
 gen bprice = price 
 by id: replace bprice = bprice[_n - 1] if missing(bprice)
 replace price = fprice if missing(price) & fprice==bprice 
 
 sort id mdate 
 list , sepby(id)

Comment

Attaullah Shah

Join Date: Aug 2014

Posts: 1667
#6

09 Jan 2020, 01:02

You can also try my new fillmissing program that is available for download from my site.

Code:

* To download net install fillmissing, from(http://fintechprofessor.com) replace * Use the above data as example clear input id price month year 123 1 1 2000 123 . 2 2000 123 . 3 2000 123 1 4 2000 123 . 5 2000 123 2 6 2000 123 2 7 2000 456 . 1 2000 456 . 2 2000 456 1.5 3 2000 456 3 4 2000 456 . 5 2000 456 . 6 2000 456 3 7 2000 end gen mdate = ym(year, month) format mdate %tm bys id (mdate): fillmissing price, with(previous)

To read more about fillmissing and its options, visit this page

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Golib Sanaev

Join Date: Apr 2017

Posts: 6
#7

30 Mar 2021, 08:42

Hello,

I have unbalanced panel data of individuals with education dummies. Some individuals increased their education level (e.g. graduated from university). In the following example, an individual changed his/her education level from "sec_ed" to "high_ed" in the year 2012. I need to fill in missing values for this individual in 2013 and 2016. In my dataset, there are many individuals with a similar case. Is there any Stata code for replacing missing values for this case? Any suggestions are very appreciated

clear
input float(hhid pid year head_hh gender age bmonth byear basic_ed sec_ed high_ed)
2003 3 2010 0 1 20 1 1990 . 1 .
2003 3 2011 0 1 21 1 1990 . 1 .
2003 3 2012 0 1 22 1 1990 . . 1
2003 3 2013 0 1 23 1 1990 . . .
2003 3 2016 0 1 26 1 1990 . . .
end
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4441
#8

30 Mar 2021, 08:47

assuming you want the "1" in high_ed of 2012 to be repeated in later years:

Code:

bys hhid pid (year): replace high_ed=high_ed[_n-1] if high_ed[_n-1]<. & high_ed==.

but why do you have separate variables for basic/sec/high_ed? see

Code:

help fvvarlist

added: my code may be more complicated than needed, but you provided so little information that I tried to cover everything that might be needed
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35468
#9

30 Mar 2021, 08:53

See https://www.stata.com/support/faqs/d...issing-values/ for a longer discussion of Rich's method.
1 like
Comment
Golib Sanaev

Join Date: Apr 2017

Posts: 6
#10

30 Mar 2021, 09:44

Originally posted by Rich Goldstein View Post

assuming you want the "1" in high_ed of 2012 to be repeated in later years:

Code:

bys hhid pid (year): replace high_ed=high_ed[_n-1] if high_ed[_n-1]<. & high_ed==.

but why do you have separate variables for basic/sec/high_ed? see

Code:

help fvvarlist

added: my code may be more complicated than needed, but you provided so little information that I tried to cover everything that might be needed

Dear Rich,
Thank you for the code! I used it and it is working. I will also consider using fvvarlist to make variables categorical
Comment
Golib Sanaev

Join Date: Apr 2017

Posts: 6
#11

30 Mar 2021, 09:46

Originally posted by Nick Cox View Post

See https://www.stata.com/support/faqs/d...issing-values/ for a longer discussion of Rich's method.

Dear Nick,
Thank you for the link! I found it very useful
Comment

Announcement