Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data: Delete complete whole case if missing values on certain variable

    Dear Stata users,

    I’ve got a panel data set and want to delete the whole person (id) from the dataset if he or she has got a missing value on another a certain variable (=dose).

    The structure looks like

    id dose

    1 0,5

    1 0,3

    1 1

    1 .

    2 0,99

    2 .

    3 0,5

    3 0,33

    3 0,8

    3 0,89

    3 0,56

    In this example I would want to delete all data from id 1 and id 2 but keep all data from id 3.

    After researching the forum, I already tried different commands but without any success.

    I would be grateful for your help!
    Thanks, Kathrin

  • #2
    Code:
    clear
    input id dose
    1 0.5
    1 0.3
    1 1
    1 .
    2 0.99
    2 .
    3 0.5
    3 0.33
    3 0.8
    3 0.89
    3 0.56
    end
    
    bys id : gen miss = !missing(dose)
    bys id : egen touse = min(miss)
    list, sepby(id)
    
    * now you restrict any command to only use the observations you want
    * your_command vars if touse, options
    
    * or
    * drop if touse == 0
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Dear Maarten,
      thank you very much! Your command works perfectly and is exactly what I was looking for!
      Greetings from Mainz!
      Kathrin

      Comment


      • #4
        Code:
        bysort id (dose) : drop if missing(dose[_N])
        gets you what you ask for in one. If you sort on dose within id, then any missing value will get sorted towards the end of its panel. Hence we just check on the last value.

        In several ways, however, Maarten's technique is better, as you might change your mind about identifiers with missing values.

        Another spin on his technique is

        Code:
        egen anymissing = max(missing(dose)), by(id)

        Comment


        • #5
          Hi Nick,
          thanks a lot for these commands, too!
          Since I'm unexperienced with panel data analysis, I'll keep them in my 'good to know' file!
          regards, Kathrin

          Comment

          Working...
          X