Dear all,
I am working on an unbalanced panel dataset that reports data of 1110 certified organisations over 7 years (2013 to 2019) (see code below). However, I have a lot of missing values as organisations only enter the dataset once it is certified, so for organisations certified in 2018, I have two years of data (2018 and 2019) - for organisations certified in 2016, I have four years of data (2016, 2017, 2018, 2019). Therefore, I understand that the missing observations (organisation-year) are non-random.
I thoroughly looked into the Web and previous posts on Statalist and I often see recommendation of imputing the missing data. However, my advisors asked me to no attempt filling up missing values.
If I understood well, if my missing observations were random, I shouldn't worry too much. However, since the missing observations are non-random, what can I do, if I can't impute the data?
Thank you so much in advance for your help!
I hope I got the posting guidelines correctly.
I am using Stata 16.1 on Mac.
I am working on an unbalanced panel dataset that reports data of 1110 certified organisations over 7 years (2013 to 2019) (see code below). However, I have a lot of missing values as organisations only enter the dataset once it is certified, so for organisations certified in 2018, I have two years of data (2018 and 2019) - for organisations certified in 2016, I have four years of data (2016, 2017, 2018, 2019). Therefore, I understand that the missing observations (organisation-year) are non-random.
Code:
xtset No Year
panel variable: No (unbalanced)
time variable: Year, 2013 to 2019, but with gaps
delta: 1 unit
Code:
xtdes
No: 1, 2, ..., 1110 n = 1110
Year: 2013, 2014, ..., 2019 T = 7
Delta(Year) = 1 unit
Span(Year) = 7 periods
(No*Year uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 1 2 3 5 7
Freq. Percent Cum. | Pattern
---------------------------+---------
446 40.18 40.18 | ......1
262 23.60 63.78 | .....11
91 8.20 71.98 | ....111
71 6.40 78.38 | ...1111
33 2.97 81.35 | ..11111
28 2.52 83.87 | ....1.1
24 2.16 86.04 | .111111
24 2.16 88.20 | 1111111
15 1.35 89.55 | ...11.1
116 10.45 100.00 | (other patterns)
---------------------------+---------
1110 100.00 | XXXXXXX
If I understood well, if my missing observations were random, I shouldn't worry too much. However, since the missing observations are non-random, what can I do, if I can't impute the data?
Thank you so much in advance for your help!
I hope I got the posting guidelines correctly.
I am using Stata 16.1 on Mac.

Comment