Should I delete observations if there are missing values for my dependent variable?

Donna Saha

Join Date: May 2017

Posts: 1
#1

Should I delete observations if there are missing values for my dependent variable?

08 May 2017, 05:05

Hi everyone,

1) My research question is to look at the casual impact of maternal employment on the probability of their adolescent children smoking. My model is the following:

Logit

Pi( prob of smoking of adolescent) = maternal employment dummy + controls

I only have cross sectional survey data on young people and my question is regarding missing data. For some individuals the data on maternal employment and smoking is missing. But there is data on other control variables e.g income. In this case should I drop all individuals who don't have data on maternal employment and smoking?

2) At the moment I haven't dropped these individuals. And when I run a posestimation command "predict" to get predicted probabilities after a logit model, ithe observations it predicts are 5780 while the individuals with data on smoking are only 1070. Therefore its is making predictions for the missing values of smoking too. So wondering whether this is happening because I have dropped the individuals I should have.

Any help is is much appreciated.
Tags: data, logit, missing data, predict
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#2

08 May 2017, 08:09

If your data is MCAR (or MAR and ignorable), then the missing data does not bias your coefficient estimates and the ensuing predicted probabilities are also valid for those individuals with missing data on the outcome. That is why Stata does not constrain the predicted probabilities to include only individuals with non missing data on the outcome. However, it is not necessary to delete any observations, you can just specify the constraint yourself (for example if you need to run some analyses that include both actual and predicted values)

Code:

predict prob if !missing(outcome), pr
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35711
#3

08 May 2017, 08:26

What Andrew says is true and helpful, but there's a shallower truth: You can just ignore what you don't want.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

08 May 2017, 08:35

Donna:
just an aside to Andrew's helpful advice: even though your missingness is ignorable (and this definition implies that you have investigated whether the missing mechanism in your data is informative or not), the inference with missing data might be less efficient (mainly due to a reduced sample size), in that any observation with any missing value will be ruled out from your -logit- regression.
Hence it's up to you (according to the most widespread methodological approach on this topic in your research field) to judge on the trade-off between multiple imputation (if feasible) and listwise deletion (that, admittedly does not always bite; see http://statisticalhorizons.com/listw...n-its-not-evil).
I do share Andrew's recommendation about not dropping observation (which is often regretting-prone), but flag them and rule them out from your statistical procedure via -if- exp qualifier, instead.

PS: Crossed in the cyberspace with Nick's reply.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Should I delete observations if there are missing values for my dependent variable?

Comment

Comment

Comment