Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to drop entire household (of two) if at least one household member has at least one missing on all explanatory variables

    Dear Statalist,

    each household consists of a man and a woman (female==1). I want to drop every household that has at least one member with at least one missing (.) on all explanatory variables. My variables below are household hid, female, country of origin (country), and spouse's country of origin (sp_country), years since migration (ysm), etc.
    There are many more variables on the right of ysm.
    How would I go about this?
    I would much appreciate suggestions.

    Best wishes.

    Nico

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long hid float(female country sp_country ysm)
    68391 0    7 1000 55
    68391 1 1000    7  0
    68392 0    . 1000  .
    68392 1 1000    .  0
    68403 0 1000 1000  0
    68403 1 1000 1000  0
    68404 0 1000 1000  0
    68404 1 1000 .     0
    68405 0 1000 1000  0
    68405 1 1000 1000  0
    end

  • #2
    Nico:
    do you mean something along the following lines?
    Code:
    . egen flag=rowmiss( female country sp_country ysm)
    
    . bysort hid: egen flag_2=mean( flag )
    
    . drop if flag_2>0
    (4 observations deleted)
    
    . list
    
         +-----------------------------------------------------------+
         |   hid   female   country   sp_cou~y   ysm   flag   flag_2 |
         |-----------------------------------------------------------|
      1. | 68391        1      1000          7     0      0        0 |
      2. | 68391        0         7       1000    55      0        0 |
      3. | 68403        0      1000       1000     0      0        0 |
      4. | 68403        1      1000       1000     0      0        0 |
      5. | 68405        0      1000       1000     0      0        0 |
         |-----------------------------------------------------------|
      6. | 68405        1      1000       1000     0      0        0 |
         +-----------------------------------------------------------+
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Code:
      ds hid, not
      local vbles `r(varlist)'
      egen int mcount = rowmiss(`vbles')
      by hid, sort: egen anybody_missing_anything = max(mcount > 0)
      drop if anybody_missing_anything

      That's how you do it. That said, bear in mind that omitting observations (or groups of observations) with missing data is not always a good idea; it can seriously bias the sample. So do think about it before proceeding.

      Added: Crossed with #2 which provides an approach that is similar in spirit, though different in detail.

      Comment


      • #4
        Dear Carlo and Clyde,

        thank you very much for your quick and accurate help. Yes, I was aware of egen with rowmiss(), but you cannot use it with the by option, that's why I got stuck.
        At any rate, I will try both suggestions tomorrow. Both are exactly what I want, so thank you very much to both of you.
        I will certainly think very hard before dropping observations Clyde thanks for the reminder.

        Have a great day!

        Best to both of you.

        Nico

        Comment

        Working...
        X