Dear all,
I am working on an unbalanced panel dataset that reports data of 1110 certified organisations over 7 years (2013 to 2019) (see code below). However, I have a lot of missing values as organisations only enter the dataset once it is certified, so for organisations certified in 2018, I have two years of data (2018 and 2019) - for organisations certified in 2016, I have four years of data (2016, 2017, 2018, 2019). Therefore, I understand that the missing observations (organisation-year) are non-random.
I thoroughly looked into the Web and previous posts on Statalist and I often see recommendation of imputing the missing data. However, my advisors asked me to no attempt filling up missing values.
If I understood well, if my missing observations were random, I shouldn't worry too much. However, since the missing observations are non-random, what can I do, if I can't impute the data?
Thank you so much in advance for your help!
I hope I got the posting guidelines correctly.
I am using Stata 16.1 on Mac.
I am working on an unbalanced panel dataset that reports data of 1110 certified organisations over 7 years (2013 to 2019) (see code below). However, I have a lot of missing values as organisations only enter the dataset once it is certified, so for organisations certified in 2018, I have two years of data (2018 and 2019) - for organisations certified in 2016, I have four years of data (2016, 2017, 2018, 2019). Therefore, I understand that the missing observations (organisation-year) are non-random.
Code:
xtset No Year panel variable: No (unbalanced) time variable: Year, 2013 to 2019, but with gaps delta: 1 unit
Code:
xtdes No: 1, 2, ..., 1110 n = 1110 Year: 2013, 2014, ..., 2019 T = 7 Delta(Year) = 1 unit Span(Year) = 7 periods (No*Year uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 1 1 1 2 3 5 7 Freq. Percent Cum. | Pattern ---------------------------+--------- 446 40.18 40.18 | ......1 262 23.60 63.78 | .....11 91 8.20 71.98 | ....111 71 6.40 78.38 | ...1111 33 2.97 81.35 | ..11111 28 2.52 83.87 | ....1.1 24 2.16 86.04 | .111111 24 2.16 88.20 | 1111111 15 1.35 89.55 | ...11.1 116 10.45 100.00 | (other patterns) ---------------------------+--------- 1110 100.00 | XXXXXXX
If I understood well, if my missing observations were random, I shouldn't worry too much. However, since the missing observations are non-random, what can I do, if I can't impute the data?
Thank you so much in advance for your help!
I hope I got the posting guidelines correctly.
I am using Stata 16.1 on Mac.
Comment