Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to deal with missing observations in large data sets

    Hello everybody, have a lovely weekend.
    I have a question.
    I have a data set from the survey of living condition for Bolivians household 38201 persons to 15 to >60 years old.
    My intentions make a probit model for the probability of been consumer (been smokers) And there are 26694 smokers whit depending variable 1= smoke 0= No smoke, and independent variables, age, wage, educations degree, gender, consume alcohol, etc. All of these are categorical variables, ones of them dummy another's categoricals.
    However, when I ran the probit regression, my result the Number of obs are 26694. So far, so good. Wen adds a condition to this regression like "if extreme_poor==1" (extreme poor is a dummy variable where 1 is a person has a low income 5739 obs and 0= otherwise and has 32440 obs ) the result of regression show only 3389 obs.

    My tutors give a bit of advice, construct the dependent variable whit 3 categorizations 1= consume tobacco 2= no consume tobacco a 3= never consume tobacco, this lates category must be capturing all person never answered the inquire for smoke or no smoke plus missing plus another.

    This advice neither helped.

    In my data set for the independent's variables, I found some have a few responses, or the main category has few 1's many 0's. I meant, Some variables I use as an independent variable the'ar shorter than the dependent variable.

    Sorry for writing a lot but I try to put the context I want to ask you.

    ¿How you tell this in the par of description of data set in a paper for a journal?

    Thank you in advance.

  • #2
    Juan:
    in all research fields, missing values are a painful issue, especially if the missing values wipe out around 90% of the theoretical available observations (admittedly, I do not think there's a sound methodological approach to justify/explain this situation in a paper and motivate why you decided to skip missing values management).
    It is relevant to highlight that if the missing mechanism is not at random, your estimates are at high risk of being biased.
    Moreover, there's an increased attention to missing values management among reviewers of technical journals.
    Tons on literature (with some excellent classics) cover how to deal with missing observations.
    You may want to take a look at -mi- entries in Stata .pdf manual.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Dear Carlo
      A new world from now on
      Thousand thank you
      Regards
      Juan

      Comment

      Working...
      X