Replacing missing values in a panel data

alessio lombini

Join Date: Dec 2020

Posts: 98
#1

Replacing missing values in a panel data

16 Dec 2020, 04:54

Hello,

I am doing a difference in difference in panel data set, years 2001-2015. The observation units are US counties. The variables in my dataset are all industries (ex: fishing, manufacturing, tourism, etc...) measured both in terms of GDP by county and in terms of employment by counties (therefore, each industry enters in my dataset twice). For certain combinations of industry j and county i, I have missing values for all years;
Ex: Baldwin county may have no missing values for all industries except for fishing, for which there are missing values for all years. Similarly, Sussex county may have values for all industries except for mining extraction only (for all years), and so on and so forth.
The missing values are not random. Indeed they are suppressed data for matters of privacy (the values were generally small, so it was possible to date back to the firms).
I would like to ask:

1) how Stata deals, by default, with missing values when I run a regression;
2) how can I replace this missing value in the best and more reliable way.

I previously tried with "ipolate", but I read that the estimates would not be reliable

I hope someone can help me thank you in advance
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 722
#2

16 Dec 2020, 04:59

1.) Listwise deletion. An observation got at least 1 missing value on any of the variables that are used in the regression? It is thrown out.
2.) Multiple imputation, see https://stats.idre.ucla.edu/stata/se...stata_pt1_new/ However, will require some work, especially with panel data.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
alessio lombini

Join Date: Dec 2020

Posts: 98
#3

16 Dec 2020, 06:47

Thank you very much Felix, I have been reading the article and this method seems rather solid. I have just a concern, the examples there are not for panel data. So, do you know if I have to add some particular specifications in the case of panel data?
Thank you again
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17718
#4

16 Dec 2020, 11:41

Alessio:
as far as -mi- with panel datasets is concerned, see Example # 4, -mi estimate- entry, Stata .pdf manual.

Kind regards,
Carlo
(Stata 19.0)
Comment
alessio lombini

Join Date: Dec 2020

Posts: 98
#5

22 Dec 2020, 04:17

Thank you very much, Carlo, I attach here my code:
I have six outcome variables (GDP_oil_extraction, GDP_fishing, GDP_tourism, EMPL_oil_extraction, EMPL_fishing, EMPL_tourism) which all contain some missing values.
This is my code:

mi xtset county_code year
mi register imputed GDP_oil_extraction GDP_fishing GDP_tourism EMPL_oil_extraction EMPL_fishing EMPL_tourism

* Then for each of the six outcome variables I run the following function:
mi estimate: xtreg outcome_variable auxiliary_variables

* I have also tried with:
mi impute mvn GDP_oil_extraction GDP_fishing GDP_tourism EMPL_oil_extraction EMPL_fishing EMPL_tourism = EMPL_total, add(10) rseed (53421)

If I am right, doing so I should get the values of my outcome variable that were previously missing. However, after having imputed these values I cannot see them (there still are missing values) in my data editor. Thus, can I ask where I can find them?

Last edited by alessio lombini; 22 Dec 2020, 04:26.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17718
#6

22 Dec 2020, 11:25

Alessio:
you should find the imputed values in the 10 imputed datasets that you created.
Tha original dataset will retain the missing values.
See use https://www.stata-press.com/data/r16/mjsps5

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Replacing missing values in a panel data

Comment

Comment

Comment

Comment

Comment