filling in missing values for panel data

Sandra Bloem

Join Date: Jun 2020

Posts: 106
#1

filling in missing values for panel data

16 May 2021, 21:29

Hi there,

I have panel data in which the education variable has missing values. I must use the surrounding non-missing values to fill in the missing ones.

The education variable is categorical with values 1 to 3, with a higher value indicating better education (1 = low education, 2 = moderate education, 3 = high education).
Hence, when looking at education over time for each individual, the value at time t+1 must be equal to or higher than the value at time t.

Here is example data. The variable educ is my current data, and I need to obtain the variable neweduc.
In this dataset only 1 or 2 consecutive observations are missing, but in my actual dataset this differs widely, with about 20 non-missing observations in a row at max. Hence, I don't think using subscripts [_n+-t] is useful. Could anyone help me solve this? Thank you in advance (:

Code:

clear all input id time educ neweduc 1 1 1 1 1 2 2 2 1 3 . 2 1 4 2 2 1 5 . . 2 1 2 2 2 2 . 2 2 3 . 2 2 4 2 2 2 5 3 3 3 1 . 1 3 2 . 1 3 3 1 1 3 4 . 1 3 5 1 1 4 1 2 2 4 2 3 3 4 3 . 3 4 4 . 3 4 5 3 3 end list, sepby(id) +----------------------------+ | id time educ neweduc | |----------------------------| 1. | 1 1 1 1 | 2. | 1 2 2 2 | 3. | 1 3 . 2 | 4. | 1 4 2 2 | 5. | 1 5 . . | |----------------------------| 6. | 2 1 2 2 | 7. | 2 2 . 2 | 8. | 2 3 . 2 | 9. | 2 4 2 2 | 10. | 2 5 3 3 | |----------------------------| 11. | 3 1 . 1 | 12. | 3 2 . 1 | 13. | 3 3 1 1 | 14. | 3 4 . 1 | 15. | 3 5 1 1 | |----------------------------| 16. | 4 1 2 2 | 17. | 4 2 3 3 | 18. | 4 3 . 3 | 19. | 4 4 . 3 | 20. | 4 5 3 3 | +----------------------------+

Thus, per individual the following needs to happen.

- All observations before the last time educ = 1 need to be educ = 1 as well (can be seen for individual 3: the last observation with educ = 1 is at time = 5, so the observations at time=1,..,4 need to be educ = 1)

- All observations between the first and last observation with educ = 2 need to be educ = 2 as well (can be seen for individual 1 and 2: all observations between the first time educ = 2 and the last time educ = 2 should be educ = 2 as well)

- All observations after the first time educ = 3 need to be educ = 3 as well (can be seen for individual 4: at time = 2 is the first observation of educ = 3, which means that education must be educ = 3 for time = 2,...5)
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10219

17 May 2021, 02:16

Filling upwards works in your example

Code:

gen wanted= educ
gsort id -time
by id: replace wanted= wanted[_n-1] if missing(wanted)

But I presume that you are looking for a combination of up and down. It is not clear whether you want the earlier value of the later value to replace the missing values.

Code:

gsort id -time
by id: gen tofill=sum(!missing(educ))>=1
gen wanted= educ
bys id (time): replace wanted= wanted[_n-1] if missing(wanted) & tofill
gsort id -time
by id: replace wanted= wanted[_n-1] if missing(wanted)

Res.:

Code:

. l, sepby(id)

     +----------------------------------------------+
     | id   time   educ   neweduc   tofill   wanted |
     |----------------------------------------------|
  1. |  1      5      .         .        0        . |
  2. |  1      4      2         2        1        2 |
  3. |  1      3      .         2        1        2 |
  4. |  1      2      2         2        1        2 |
  5. |  1      1      1         1        1        1 |
     |----------------------------------------------|
  6. |  2      5      3         3        1        3 |
  7. |  2      4      2         2        1        2 |
  8. |  2      3      .         2        1        2 |
  9. |  2      2      .         2        1        2 |
 10. |  2      1      2         2        1        2 |
     |----------------------------------------------|
 11. |  3      5      1         1        1        1 |
 12. |  3      4      .         1        1        1 |
 13. |  3      3      1         1        1        1 |
 14. |  3      2      .         1        1        1 |
 15. |  3      1      .         1        1        1 |
     |----------------------------------------------|
 16. |  4      5      3         3        1        3 |
 17. |  4      4      .         3        1        3 |
 18. |  4      3      .         3        1        3 |
 19. |  4      2      3         3        1        3 |
 20. |  4      1      2         2        1        2 |
     +----------------------------------------------+

.

Last edited by Andrew Musau; 17 May 2021, 02:32.

Comment

Sandra Bloem

Join Date: Jun 2020

Posts: 106
#3

20 May 2021, 03:37

Andrew Musau

Oh wow, this definitely helped a ton. I finally solved it (: Thank you, have a blessed day.

For anyone looking at this with a similar issue, this linked gave some nice insights as well

https://www.ls3.soziologie.uni-muenc...tacommands.pdf

Or type in -useful stata analysis for longitudinal data- and you can find it as well.
Comment

Announcement

filling in missing values for panel data

Comment

Comment