Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to treat missing values

    Hello everybody,

    currently i am studying a do-file and got a bit confused.

    For most of the variables, the author of the do-file codes the negative values of the variables as missing (essentially they are all equal to " . ".

    But for some of them he replaces them with the value "99" and drops all observations where the dummy for that variable is qual to 1

    for example :

    Code:
    replace sec=99 if sec==.
    tab sec, gen(sec_)
    
                             Security |      Freq.     Percent        Cum.
    ----------------------------------------+-----------------------------------
                          [1] high sec|     46,640       10.53       10.53
                          [2] med sec |    120,017       27.09       37.62
                           [3] no sec |    161,136       36.37       73.99
                                   99 |    115,228       26.01      100.00
    ----------------------------------------+-----------------------------------
                                Total |    443,021      100.00
    And then he runs "drop if sec_4==1"

    What are the reasons for that? Why not just let the missing value stay " . "

    Thanks in advance
    Last edited by Ahmad Wali; 10 Aug 2018, 12:41.

  • #2
    There is no obvious reason for this, at least not based on the information you have shared. In any case, I doubt you can get an answer to this question from anybody except the person who wrote that do-file.

    Comment


    • #3
      Thank you very much for your reply.

      I would provide you with more information if i knew what the goals of those specific steps were.

      My main point is: I want to modiy that do-file - but doing the steps described above (for some of the covariates) will drop obsvervations that i am interested in...

      So maybe a more general question: If there is no obvious reason, then i can let those missing values stay as they are (equal to ".") and do my analysis? (for example run regress)


      Sorry if i can't provide you with more information as i am not sure myself why those steps are taken and what information would help.

      Comment


      • #4
        If you have an observation with a missing value for a variable, and you then do a regression (or any kind) that uses that variable, that observation will be omitted from the regression in any case. So, at least for the purposes of that regression, it doesn't matter whether you drop the observation or retain it with a missing value (but, I cannot overemphasize, a real missing value, not 99!!!!).

        Now, of course, it may be that if you a drop an observation with missing value for variable x, you will lose non-missing information from a different variable y, which could then bias the results of an analysis involving y but not x. If you have such situations then you should not drop these observations, and you should just leave them in with their missing values.

        In general, retaining an observation that has missing values is unlikely to have adverse effects on an analysis, other than perhaps the increased size of the data set slowing things down.

        Comment


        • #5
          That is exactly the case, i will lose non-missing information from a different variable y.

          But according to your first point, those non-missing values of variable y won't help me either as long as i am also using variable x (with missing information) in the regression (since that specific observation will be omitted anyways...?!)

          Comment


          • #6
            It would have been more straightforward just to say something like

            drop if missing(sec)

            Also, rather than recode to 99, the tab command could have been

            tab sec, missing

            But why drop in the first place? Most programs do listwise deletion so cases won't appear in the analysis anywhere.

            My guess is that this person was not a great Stata programmer. I would make my own decisions about how best to handle missing data.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment

            Working...
            X