Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I delete observations if there are missing values for my dependent variable?

    Hi everyone,

    1) My research question is to look at the casual impact of maternal employment on the probability of their adolescent children smoking. My model is the following:

    Logit

    Pi( prob of smoking of adolescent) = maternal employment dummy + controls

    I only have cross sectional survey data on young people and my question is regarding missing data. For some individuals the data on maternal employment and smoking is missing. But there is data on other control variables e.g income. In this case should I drop all individuals who don't have data on maternal employment and smoking?


    2) At the moment I haven't dropped these individuals. And when I run a posestimation command "predict" to get predicted probabilities after a logit model, ithe observations it predicts are 5780 while the individuals with data on smoking are only 1070. Therefore its is making predictions for the missing values of smoking too. So wondering whether this is happening because I have dropped the individuals I should have.

    Any help is is much appreciated.

  • #2
    If your data is MCAR (or MAR and ignorable), then the missing data does not bias your coefficient estimates and the ensuing predicted probabilities are also valid for those individuals with missing data on the outcome. That is why Stata does not constrain the predicted probabilities to include only individuals with non missing data on the outcome. However, it is not necessary to delete any observations, you can just specify the constraint yourself (for example if you need to run some analyses that include both actual and predicted values)

    Code:
    predict prob if !missing(outcome), pr

    Comment


    • #3
      What Andrew says is true and helpful, but there's a shallower truth: You can just ignore what you don't want.

      Comment


      • #4
        Donna:
        just an aside to Andrew's helpful advice: even though your missingness is ignorable (and this definition implies that you have investigated whether the missing mechanism in your data is informative or not), the inference with missing data might be less efficient (mainly due to a reduced sample size), in that any observation with any missing value will be ruled out from your -logit- regression.
        Hence it's up to you (according to the most widespread methodological approach on this topic in your research field) to judge on the trade-off between multiple imputation (if feasible) and listwise deletion (that, admittedly does not always bite; see http://statisticalhorizons.com/listw...n-its-not-evil).
        I do share Andrew's recommendation about not dropping observation (which is often regretting-prone), but flag them and rule them out from your statistical procedure via -if- exp qualifier, instead.

        PS: Crossed in the cyberspace with Nick's reply.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X