Hi all,
This data cleaning is driving me crazy. I would appreciate your help in this matter. I have a panel data with plenty of distinct observations (firms). The reference period encompasses 5 years (2003-2007). Let NIP be a variable that measures the number of product innovations made by each firm. What I want to do is to delete those firms that did not make any product innovation during the whole period (NIP == 0), have all missing values (NIP == .), or a combination of these two situations (NIP == 0 | NIP == .). Let me give you an example.
Let’s imagine that we have a sample of 3 firms with the following values (missing values are represented by “.”):
year id NIP
2003 1 0
2004 1 0
2005 1 0
2006 1 0
2007 1 0
2003 2 3
2004 2 0
2005 2 0
2006 2 0
2007 2 0
2003 3 0
2004 3 0
2005 3 0
2006 3 .
2007 3 .
In this case, I would just want to keep the second firm in the database because that company created three product innovations in 2003. The first company should be deleted, considering that it did not make any innovation. Finally, the third company should also be eliminated, since it did not make any innovation or has missing values in other years. So, the sample should be like this:
year id NIP
2003 2 3
2004 2 0
2005 2 0
2006 2 0
2007 2 0
Any idea? Thanks in advance!
Best regards,
Antonio
This data cleaning is driving me crazy. I would appreciate your help in this matter. I have a panel data with plenty of distinct observations (firms). The reference period encompasses 5 years (2003-2007). Let NIP be a variable that measures the number of product innovations made by each firm. What I want to do is to delete those firms that did not make any product innovation during the whole period (NIP == 0), have all missing values (NIP == .), or a combination of these two situations (NIP == 0 | NIP == .). Let me give you an example.
Let’s imagine that we have a sample of 3 firms with the following values (missing values are represented by “.”):
year id NIP
2003 1 0
2004 1 0
2005 1 0
2006 1 0
2007 1 0
2003 2 3
2004 2 0
2005 2 0
2006 2 0
2007 2 0
2003 3 0
2004 3 0
2005 3 0
2006 3 .
2007 3 .
In this case, I would just want to keep the second firm in the database because that company created three product innovations in 2003. The first company should be deleted, considering that it did not make any innovation. Finally, the third company should also be eliminated, since it did not make any innovation or has missing values in other years. So, the sample should be like this:
year id NIP
2003 2 3
2004 2 0
2005 2 0
2006 2 0
2007 2 0
Any idea? Thanks in advance!
Best regards,
Antonio
Comment