Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting distinct observations considering one variable with zero or missing values

    Hi all,

    This data cleaning is driving me crazy. I would appreciate your help in this matter. I have a panel data with plenty of distinct observations (firms). The reference period encompasses 5 years (2003-2007). Let NIP be a variable that measures the number of product innovations made by each firm. What I want to do is to delete those firms that did not make any product innovation during the whole period (NIP == 0), have all missing values (NIP == .), or a combination of these two situations (NIP == 0 | NIP == .). Let me give you an example.

    Let’s imagine that we have a sample of 3 firms with the following values (missing values are represented by “.”):

    year id NIP
    2003 1 0
    2004 1 0
    2005 1 0
    2006 1 0
    2007 1 0
    2003 2 3
    2004 2 0
    2005 2 0
    2006 2 0
    2007 2 0
    2003 3 0
    2004 3 0
    2005 3 0
    2006 3 .
    2007 3 .

    In this case, I would just want to keep the second firm in the database because that company created three product innovations in 2003. The first company should be deleted, considering that it did not make any innovation. Finally, the third company should also be eliminated, since it did not make any innovation or has missing values in other years. So, the sample should be like this:

    year id NIP
    2003 2 3
    2004 2 0
    2005 2 0
    2006 2 0
    2007 2 0

    Any idea? Thanks in advance!

    Best regards,
    Antonio

  • #2
    Code:
    bys id: egen NIPtotal = total(NIP)
    and then either
    Code:
    drop if NIPtotal ==0
    or keep observations and run all your analysis with ', if NIPtotal>0'

    Comment


    • #3
      There is no need for this to drive you crazy. Let's define 'bad' values:

      Code:
      gen byte bad = inlist(NIP, 0, .)
      If you don't know inlist(), this is as easy:

      Code:
      gen byte bad = NIP == 0 | NIP == .
      Then what you want is just a twist on a previous question:

      Code:
      bysort id (bad) : drop if bad[1]
      That may seem less tricky if we write it like this:

      Code:
      bysort id (bad) : drop if bad[1] == 1
      It we sort on the new variable bad, then any zeros must be sorted to the beginning of each panel. But if the lowest value of bad is 1, then there are no zeros, all values are bad and we drop the panel.

      Comment


      • #4
        Wow! Thanks Nick. You are really brilliant at Stata

        Comment


        • #5
          Thanks. Jorrit's code works very well indeed in this case.

          Comment

          Working...
          X