I have the following unbalanced panel data set.
This dataset contains both numeric and string variables. Variables in columns 2-6 are numeric and columns 7-8 are string.
I want to have unique observations under firmID-Year pair. But as can be seen, obs 2,3,4,5,7,8 are duplicates in terms of firmID-year pair.
I do not want to just drop the duplicate values. I want to aggregate the duplicate observations into a single observation. I tried to use collapse command, but it only considers numeric variables and not string variables. Hence my post here.
I would like to do the following:
1. For every numeric variable, I want to take a mean in case of numeric variables
2. In the case of string variables, I want to keep one of the two values (anyone of two works). If for a string variable, one value is missing and one is not, I want to keep the non-missing value.
So the aforementioned sample data should look like this-
Plase help. Thanks much!
This dataset contains both numeric and string variables. Variables in columns 2-6 are numeric and columns 7-8 are string.
I want to have unique observations under firmID-Year pair. But as can be seen, obs 2,3,4,5,7,8 are duplicates in terms of firmID-year pair.
ObsNo. | industryID | firmID | Year | Var1 | var2 | var3 | var4 |
1 | 1641 | 1 | 2006 | 12 | 9.3 | USD | QW |
2 | 1641 | 1 | 2008 | 9 | 21 | USD | DA |
3 | 1641 | 1 | 2008 | 11 | 23 | USD | CN |
4 | 3486 | 2 | 2009 | 7 | 99 | USD | |
5 | 3486 | 2 | 2009 | 5.7 | 45 | USD | |
6 | 3486 | 2 | 2003 | 34 | 21.21 | USD | YY |
7 | 1641 | 3 | 2004 | 0.9 | 0 | USD | RP |
8 | 1641 | 3 | 2004 | 55 | USD |
I do not want to just drop the duplicate values. I want to aggregate the duplicate observations into a single observation. I tried to use collapse command, but it only considers numeric variables and not string variables. Hence my post here.
I would like to do the following:
1. For every numeric variable, I want to take a mean in case of numeric variables
2. In the case of string variables, I want to keep one of the two values (anyone of two works). If for a string variable, one value is missing and one is not, I want to keep the non-missing value.
So the aforementioned sample data should look like this-
ObsNo. | industryID | firmID | Year | Var1 | var2 | var3 | var4 |
1 | 1641 | 1 | 2006 | 12 | 9.3 | USD | DA |
2 | 1641 | 1 | 2008 | 10 | 22 | USD | DA |
3 | 3486 | 2 | 2009 | 6.35 | 72 | USD | |
4 | 3486 | 2 | 2003 | 34 | 21.21 | USD | YY |
5 | 1641 | 3 | 2004 | 27.95 | 0 | USD | RP |
Plase help. Thanks much!
Comment