Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dealing with Missing value

    Hello all

    I have penal data with a "variable Net weekly pay". This variable has negative values because of the way of collecting data. In other words, the " -9" mean that this question does not apply to this individual, so they coded "-9."

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double PERSID byte quarter long NETWK
    10101010101  6  -9
    10101010101  8  -9
    10101010101  9  -9
    10102020102  6  99
    10102020102  7  -9
    10102020102  8  -9
    10102020102  9  -9
    10102020102 10  44
    10104030101  8 346
    10104030101  9  -9
    10104030101 10  -9
    10104030101 11  -9
    10104030101 12 350
    10104030102  8  -9
    10104030102  9  -9
    10104030102 10  -9
    10104030102 11  -9
    10104030102 12  -9
    10203030101  7 190
    10203030101  8  -9
    10203030101  9  -9
    10203030101 10  -9
    10203030101 11 160
    10303030101  7  -9
    10303030101  8  -9
    10303030101  9  -9
    10303030101 10  -9
    10303030101 11  -9
    10304050102  8 438
    10501030101  5 416
    10501030101  6  -9
    10501030101  7  -9
    10501030101  8  -9
    10501030101  9 415
    10602020101  6  -9
    10602020101  7  -9
    10602020101  8  -9
    10602020101  9  -9
    10602020101 10  -9
    10603070101  7  -9
    10603070101  8  -9
    10603070101  9  -9
    10603070101 11  -9
    10604060102  8 635
    10604060102  9  -9
    10604060102 10  -9
    10604060102 11  -9
    10604060102 12  -8
    10604070101 11  -9
    10604070101 12  -9
    10604070103  8 115
    10604070103  9  -9
    10604070103 10  -9
    10604070103 11  -9
    10604070103 12  -8
    10701010101  5 314
    10701010101  6  -9
    10701010101  7  -9
    10701010101  8  -9
    10701010101  9 300
    10793010101  4  -9
    10793010101  5  -9
    10793010101  6  -9
    10793010101  7 383
    10794010101  4 485
    10794010101  5  -9
    10794010101  6  -9
    10794010101  7  -9
    10794010101  8 454
    10794010102  4 519
    10794010102  5  -9
    10794010102  6  -9
    10794010102  7  -9
    10794010102  8 219
    10801020103  5  -9
    10801020103  8  -9
    10801020103  9 395
    10801020104  8  -9
    10801020104  9 166
    10802020101  6 438
    10802020101  7  -9
    10802020101  8  -9
    10802020101  9  -9
    10802020101 10 438
    10802020102  6 254
    10802020102  7  -9
    10802020102  8  -9
    10802020102  9  -9
    10802020102 10 277
    10993020101  4  -9
    10993020101  6  -9
    10993020102  4  -9
    10993020102  5  -9
    10993020102  6  -9
    10993020102  7 923
    11001010101  5  -9
    11001010101  6  -9
    11001010101  7  -9
    11001010101  8  -9
    11001010101  9  -9
    end
    label values quarter quarter
    label def quarter 4 "Oct-Des 2019", modify
    label def quarter 5 "Jan-Mar 2020", modify
    label def quarter 6 "Apr-June 2020", modify
    label def quarter 7 "July-Sep 2020", modify
    label def quarter 8 "Oct-Des 2020", modify
    label def quarter 9 "Jan-Mar 2021", modify
    label def quarter 10 "Apr-June 2021", modify
    label def quarter 11 "July-Sep 2021", modify
    label def quarter 12 "Oct-Des 2021", modify
    label values NETWK NETWK5
    label def NETWK5 -9 "Does not apply", modify
    label def NETWK5 -8 "No answer", modify

    My question is :

    Should I recode the "-9" as a missing value when analysing the data?

    I tried to run regression both ways with missing values and with -9. So it gave me significantly different results because my total observation is around 70000, and this variable of Net weekly pay has the majority as "-9" (about 48000)

    What is the best way to deal with this matter?


    Many thanks

  • #2
    Indeed. -9 should not be taken literally and should be recoded to missing. You can use replace or mvdecode

    Comment


    • #3
      Dalia:
      Nick's helpful guidance is easy to implement:
      Code:
      . sum
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
            PERSID |        100    1.06e+10    2.96e+08   1.01e+10   1.10e+10
           quarter |        100        7.97    2.076589          4         12
             NETWK |        100          81    181.8413         -9        923
      
      . replace NETWK=. if NETWK==-9
      
      
      . sum
      
          Variable |        Obs        Mean    Std. dev.       Min        Max
      -------------+---------------------------------------------------------
            PERSID |        100    1.06e+10    2.96e+08   1.01e+10   1.10e+10
           quarter |        100        7.97    2.076589          4         12
             NETWK |         27    324.3333    204.1455         -8        923
      
      .
      That said, the main issue is about missing values diagnostic: are they informative or not?
      Basically, it's a matter of investugating whether they are missing completely at random (unfrequent in panel data setting), missing at random (more likely) missing not at random (panels attrition may well depend on the unobserved values).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X