Unbalanced panel data with non-random missing observations

Jeanne Roche

Join Date: Jul 2021

Posts: 14
#1

Unbalanced panel data with non-random missing observations

23 Sep 2021, 08:55

Dear all,

I am working on an unbalanced panel dataset that reports data of 1110 certified organisations over 7 years (2013 to 2019) (see code below). However, I have a lot of missing values as organisations only enter the dataset once it is certified, so for organisations certified in 2018, I have two years of data (2018 and 2019) - for organisations certified in 2016, I have four years of data (2016, 2017, 2018, 2019). Therefore, I understand that the missing observations (organisation-year) are non-random.

Code:

xtset No Year panel variable: No (unbalanced) time variable: Year, 2013 to 2019, but with gaps delta: 1 unit

Code:

xtdes No: 1, 2, ..., 1110 n = 1110 Year: 2013, 2014, ..., 2019 T = 7 Delta(Year) = 1 unit Span(Year) = 7 periods (No*Year uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 1 1 1 2 3 5 7 Freq. Percent Cum. | Pattern ---------------------------+--------- 446 40.18 40.18 | ......1 262 23.60 63.78 | .....11 91 8.20 71.98 | ....111 71 6.40 78.38 | ...1111 33 2.97 81.35 | ..11111 28 2.52 83.87 | ....1.1 24 2.16 86.04 | .111111 24 2.16 88.20 | 1111111 15 1.35 89.55 | ...11.1 116 10.45 100.00 | (other patterns) ---------------------------+--------- 1110 100.00 | XXXXXXX

I thoroughly looked into the Web and previous posts on Statalist and I often see recommendation of imputing the missing data. However, my advisors asked me to no attempt filling up missing values.
If I understood well, if my missing observations were random, I shouldn't worry too much. However, since the missing observations are non-random, what can I do, if I can't impute the data?

Thank you so much in advance for your help!
I hope I got the posting guidelines correctly.

I am using Stata 16.1 on Mac.

Last edited by Jeanne Roche; 23 Sep 2021, 08:58. Reason: I added the output of the xtset command for more clarity on the dataset
Tags: panel data
Maxence Morlet

Join Date: Mar 2021

Posts: 652
#2

23 Sep 2021, 09:08

Bonjour Jeanne,

Someone will probably provide a much better solution later. In the interest of time and to help you if ever you're under any time constraint, here is my limited advice.

What you could try to do is to generate a dummy taking the value of one when one variable is missing. Then perform a logit regression of this dummy on a dummy assuming the value of one in the case of certification and other explanatory variables if theory dictates there are other detemrinants of missingness. Is this feasible?

Then assess the significance of the odds ratio on the explanatory variables. If the odds ratios are significant, this would be problematic. You may have to include it as a footnote in your findings as a potential source of bias. If they are not, it is less of a problem.

Your situation might somewhat resemble survivorship bias in finance.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#3

23 Sep 2021, 09:21

Jeanne:
please note that multiple imputation procedures were actually developed for data missing at random.
When the missing mechanism in NMAR or MNAR (missing not at random),other methods are recommended (delta-adjustment) (see for instance Stef van Buuren et al. pivotal paper on this topic at https://pubmed.ncbi.nlm.nih.gov/10204197/).
That said, it remains to be seen whether performng the analysis as requested by your supervisor makes sense (as panel with gaps are really common) in your research field or other takes are suggested by previous papers on the topic you're dealing with.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeanne Roche

Join Date: Jul 2021

Posts: 14
#4

23 Sep 2021, 10:10

Dear Maxence and Carlo,

Thank you so much for your prompt responses! I started working on Maxence's point and while I find that the "missingness" doesn't affect some of my (candidate) explanatory variables, it does affect others so I need to find a solution.

I will have a look at delta-adjustment, however, I understand that this is also a multiple imputation solution, correct? But how does it work if the missing values are non-random since I understood that imputation was designed for observations missing at random - or did I understand it all wrong?

Thanks a lot for your help!
Best regards,
Jeanne
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#5

23 Sep 2021, 12:31

Jeanne:
take a look at the paper and everything will be clearer.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Unbalanced panel data with non-random missing observations

Comment

Comment

Comment

Comment