Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reasons for dropping observations with missing values

    Hello, I'm in dire need of your help if you are reading this. I just made a profile today and have a project due monday. The class is econometrics and we are working with Stata. I have a general understanding of how to do the project but definitely not an A+ understanding of how to do it.

    My teacher gave us a sample project that a student completed back in 2012. The student is using data with 935 observations and then he states he "after omitting observations for missing values, I had 663 valid observations."

    My questions is why would he omit observations and how would he omit. I think the omitting command is "drop if mi(). But I don't understand what he would put there.
    He is working with a data set that deals with wage, educ, etc…He's trying to figure out if educ has any effect on wage, etc.

    Is he dropping observations that are missing values for educ?

  • #2
    Well, you're kind of asking if we can help you read his mind. I don't think the Forum members are any more telepathic than you are.

    All kidding aside, one fairly high probability guess is this: whenever you perform a regression analysis, only observations with non-missing values on all of the variables named in the regression model are used. Any others are omitted from the analysis. This happens automatically: you don't have to explicitly drop those observations. If load in the data, get a count of the observations to verify it's 935, and then run the regression in question, my guess is that the N for the regression will be 663. (You'll see the N in the regression output.)

    Now, that said, it is well known that if the missing values in the omitted observations are missing other than missing completely at random, their exclusion can result in biased estimates of the model parameters. So just taking the results of that analysis at face value is problematic. To fully deal with missing data requires an understanding of the mechanism(s) underlying the missing data, and then some kind of statistical approach such as multiple imputation if the data are missing at random (but not completely at random), or some sort of sensitivity analysis for data missing not at random is usually warranted. But this may be going beyond what would be expected in a basic econometrics class.

    By the way, if Steppen Wolfe is not your real name, it would be appreciated if you would follow the norm in this community by clicking on CONTACT US at the bottom and asking the administrator to change your username to your real first and last name. We like to keep it professional here.

    Comment


    • #3
      As far as I know, Steppenwolf is the name of a Canadian/US rock band, whose most famous song is "Born to be wild": hence Clyde's suspicion about your real identity meets mine.
      That said, by acritically ruling out missing values your colleague made some window-dressing to her/his dataset and, in all likelihood (i.e., if missing values were not completely at random, echoing Clyde's helpful advice) ended up analysing a selected sample.
      You may probably want to take a tour into the literature related to this topic; a good first step may be the following textbook: http://eu.wiley.com/WileyCDA/WileyTi...471183865.html.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Steppenwolf was a novel by Herman Hesse.

        Comment


        • #5
          My knowledge of Herman Hesse's works is quite poor: Death and the Lover was probably the only one novel of his that I've read when I was at the high school.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            I guess I'm a bit more culturally aligned with Nick than with Carlo: it was Herman Hesse's novel I had in mind. Although I vaguely recall having heard of the rock band, too.

            Comment

            Working...
            X