Problem of handling missing data

Avni Bhat

Join Date: Nov 2018

Posts: 1
#1

Problem of handling missing data

29 Nov 2018, 04:14

Hi everyone,
I am currently working on looking at the impact of intellectual property rights on the Indian pharmaceutical industry. I have a panel data set (secondary data from CMIE) of 350 firms across 28 time periods. However, I am facing a big problem with regard to missing data. Almost all the variables I need to consider in the model (Eg: R&D=f(pat, exports, imported tech etc) have missing data ranging from 10% to 30%. How best would you suggest I handle this problem before undertaking any analysis? List wise deletion in Stata reduces the number of firms to 68, drastically reducing the sample size.

Is multiple imputation of data when all variables have some missing values a possibility in Stata?
Thank you in advance!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

29 Nov 2018, 05:14

Avni:
welcome to this forum.
Multiple imputation, which under given missing data mechanisms can be a viable option to deal with missing data, is actually (quite) easily performed in Stata and widely covered in Stata .pdf manual (see -mi-.related entries).

Kind regards,
Carlo
(Stata 19.0)
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

29 Nov 2018, 07:55

Originally posted by Avni Bhat View Post

...

Is multiple imputation of data when all variables have some missing values a possibility in Stata?
Thank you in advance!

Are you are asking if you can conduct multiple imputation where many variables have some missing values? The answer is yes. Furthermore, you can handle multiple different types of data with the appropriate imputation method, e.g. logit for categorical, regress for continuous, Poisson or negative binomial for count.

I'd go to the multiple imputation manual, read the intro, then go to the section called mi impute chained.

One caveat is that I'm not sure how well multiple imputation plays with panel data. If the data are monotone missing, then the task is easier. One example of monotone missing is in randomized trials, you will frequently have multiple follow-up visits. If someone withdrew consent before time 2, then the observations for time 2, 3, 4, etc, will all be missing. Non-monotone missing in panel data are harder. You said you have panel data, so you may wish to read this Stata FAQ on imputation in clustered data in general.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Problem of handling missing data

Comment

Comment