Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • WHEN to delete missing/negative values?

    Hello,

    I was wondering when it is best to clean the data (i.e. delete missing or negative values) when performing different types of regression analysis?
    If I want to first perform a linear regression, and afterwards a multiple regression; should I delete all the negative values of the variables (i.e. keep only VAR>=0) I want to use at the beginning/ before running both regressions, or should I only delete the missing data which will be used for THE particular regression?

    I would think that the first option is better, since the same amount of observations will remain for each type of regression.
    Otherwise, the linear regression could be based on for instance 20,000 observations, and the multiple regression based on 14,000 observations...
    Can someone confirm this?

    Thanks in advance!

  • #2
    In general, missing values should not be deleted. If you need to have the same subsample for 2 regressions, you can do the "bigger" multiple regression first, then use "if e(sample)" for the smaller one.
    Best regards,

    Marcos

    Comment


    • #3
      I try my best to don't delete observations, just tag the ones I deem "faulty" in another variable. The reason is if I want to go back to those "faulty" ones in later steps of the analysis, it's easy.

      You don't need to delete observations with missing values for a given variable to run regressions. If said variable enters the regression, observations with missing values for vars that enter the regression will be kept out of the regression anyway.

      As for the negative values, being a negative value is not itself a reason for "cleaning" the data. There are obviously outcomes that can be negative. I don't know what you are working with, but perhaps more important than cleaning your data with some criteria you come up with, is to understand the underlying reasons why the data came that way. Are the negative values indeed possible, just inconvenient? If so, I'd argue against deleting them at all. If negative values are impossible to be obtained for your data, perhaps you should go back and check what happened to those observations - why were they assigned impossible values? Is it a faulty sensor? A poorly trained interviewer? How does this affect the rest of the data?
      Last edited by Igor Paploski; 12 Aug 2019, 10:43.

      Comment


      • #4
        Hi,

        Thanks for your response.

        To put it simply: I performed a linear regression based on 30,000 observations. Then, a multiple regression where I only keep positive values of the added variables (for known reasons), using code:
        Code:
        keep if VAR1>=0
        keep if VAR2>=0
        and so on.

        Therefore, the output of the multiple regression shows 14,500 observations (after the removal of the negative values).

        Now I want to discuss and compare both regressions.

        My question is: will the reduction in the number of observations in the multiple regression model not cause a distorted image/bias?
        In other words, should I remove these negative values of the data before performing the linear regression as well in order to have the same amount of observations (14,500) in both regression analyses?

        Comment


        • #5
          But, why do you want to delete negative values at all? Do you think they are coding errors? If they are legitimate values, you should keep them.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Hi Richard,

            Because negative values in the data refer to certain dates that fall outside my research period. So I don't want to include these in my testing.

            Comment

            Working...
            X