Deleting observations with missing values

Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#1

Deleting observations with missing values

14 May 2022, 06:58

Dear all,

For my thesis I have to research the causes of layoffs for (big) Belgian firms. My data consists of unbalanced panel data, period 2011-2020. In the first part of my results section I analyse the summary statistics and bivariate analysis (t-test/ranksum test).
The screenshot gives a table of the summary statistics of my variables. Sorry for it being in Dutch, but my problem should be clear with the information underneath the screenshot.

(Variables; #observations; mean; median, min; max)

Due to the fact that I have to much variation in #observations for my variables (for example 185 147 for Ln(Age) & only 106 419 for my 2 independent variables about productivity --> Apparently my regression wil take maximum those 106 419 firms into account --> those 80 000 extra observations can skew my sample significantly..
My promotor advised me to delete observations with a missing value.

After reading the following article ( https://www.stata.com/support/faqs/d...issing-values/ ), I do not quite understand the proposed solution. The article talks about missing values at the beginning and end, but in my case I do not quite understand how to fix the problem.
Originally I thought the following code (and doing this for all my variables) would be OK, but apparently it is not:

Code:

drop if DalingProductiviteit2J >= .

I understand that this is rather a dumb question, but apparently complex enough for me. I hope that the problem is clearly described. If not, please let me know.

Thanks in advance,
Jordi
Tags: None
Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#2

14 May 2022, 07:34

perhaps,

Code:

egen unwanted = rowmiss(_all) drop if unwanted
Comment

Jordi Imbrechts

Join Date: Apr 2022
Posts: 44

14 May 2022, 08:08

Thank you!

Right before I saw your comment I used the following command:

Code:

gen dummyMISSINGPROD = 0
replace dummyMISSINGPROD = 1 if !missing( DalingProductiviteit2J)
drop if dummyMISSINGPROD == 0

Just to be sure: if I drop all the observations with missing values like this right before I analyse my data with summary statistics, t-test/Wilcoxon ranksum test, ... Stata will now only take those observations into account that have a value for DalingProductiviteit2J? So that my regression (xtlogit) will analyse the same amount of observations like the Wilcoxon ranksum test etc does?

EDIT:
My summary statistics look like the following after using the commands above:

Variable	Obs	Mean	Std. Dev.	Min	Max

Collectief~t	92,838	.1037075	.3048824	0	1
Productivi~5	93,552	781421.9	1112061	51460.25	4587718
DalingPro~1J	93,552	.4275483	.4947255	0	1
DalingPro~2J	93,552	.1803489	.3844798	0	1
onder_medi~d	93,552	.4900911	.4999045	0	1

ROA_w1	93,547	.0850092	.137789	-.8055618	.5625707
DalingROA1J	93,547	.5168097	.49972	0	1
DalingROA2J	93,546	.2380006	.4258617	0	1
onder_medi~A	93,547	.4397896	.4963641	0	1
GUO	93,539	.3199307	.4664519	0	1

DUO	93,539	.3215343	.4670678	0	1
Groepsbedr~f	93,552	.6413759	.4795991	0	1
StandAlone	93,539	.3585349	.4795728	0	1
Lnleeftijd~5	93,552	3.144766	.6948031	1.098612	4.143135
Lngrootte_w5	93,552	16.30382	1.455338	12.64215	19.04326

Schuldgraa~5	93,549	.5829031	.2734962	.0288165	1.101723
MVAratio_w5	88,454	.1833262	.2160198	.0009628	.8730854

Is this okay to proceed with bivariate and multivariate analysis? Or should I drop all the observations where a missing value has been reported, so that eventually all the variables have the exact same amount of observations?

Last edited by Jordi Imbrechts; 14 May 2022, 08:14.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

14 May 2022, 11:05

Jordi:
not quite.
Stata will omit all the observations with a missing value in at least one variable.
In addition, dropping all the observations with missing values yourself in order to go complete case analysis without diagnosing the mechanism underlying their missingness, is a (very) risky methodological approach, as you may (easily) end up with a biased subsample with a tenuous relationship with the original one.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#5

14 May 2022, 12:00

Carlo,

I used the following code instead:

Code:

egen unwanted = rowmiss(_all) drop if unwanted

As a result, this is what my summary statistics look like:

DalingPROD1J & DalingROA1J are both dummy variables that have a missing value for the first observation (hence the difference in #obs)
DalingPROD2J & DalingROA2J are both dummy variables that have a missing value for the first 2 observations

Is this also a (very) risky methodological approach?

Kind regards,
Jordi
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

14 May 2022, 12:12

Jordi:
the risk is related to the reason why those data are missing.
If you're sure that they are missing completely at random (se -mi- glossary), your resulting dataset will be a random subsample of your original one.
As far as I can get your screenshot (BTW: as per FAQ screnshots are not the recommended way to share your Stata codes/results), you have a theoretical sample of 115,381 observations that is expected to lose >20,000 (83,390) due to missing values.
Obviously, what above holds assuming that all the reported variables will be used in your panel data regression.

Last edited by Carlo Lazzaro; 14 May 2022, 12:18.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#7

14 May 2022, 12:24

Carlo,

I do not completely understand your comment. My bad.
I exported all my data from Bel-first database and generated some new variables in Stata. Before using the "drop if unwanted" command, the missing values were purely because of the fact that for example a firm does not have any data for that variable. And some others firm for example did not have any data for another variable. So yes, they were random.
By using "drop if unwanted" my unbalanced panel data changed to unbalanced panel data with gaps.

Is this methodology correct?

Kind regards, Jordi
Comment
Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#8

14 May 2022, 20:15

Is there any way to delete a post? I realize this was a relatively dumb question.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

15 May 2022, 03:36

Jordi:
you can delete a post within 1 hour form posting it.
That said, your question was not dumb at all if you felt the need to post it.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#10

15 May 2022, 06:43

Carlo,
That is true . Thanks anyways!
Comment

Announcement