I have a panel database of firms. There are some cases where the parent and their subsidiary both report the exact same values for things like sales, assets, etc. I was able to see these by using the duplicates command, specifically
If I sort by sales (or assets) and list the data, I can clearly see there are many pairs of firms that have the same values in the same year. I get something like this:
Company ID Sales Assets Year dups
1 6 25 1999 1
2 6 25 1999 1
3 10 100 1999 1
4 10 100 1999 1
1 3.5 45 2000 1
2 3.5 45 2000 1
3 1 50 2000 1
4 1 50 2000 1
To correct for double counting, I'd like to keep one firm and drop the other. I've seen code for create an alternating dummy value each year, which would work in my fictional list I've provided. In my actual (more complicated) data, this would not work (I would end up dropping some years for Company A and some years for Company B, when it would be better to just drop one or the other). That is, I need to create a dummy variable where 0 equals one of the firms with duplicate data and 1 equals the other. Or, in line with the fictional data I have, I want to tell Stata that firm 1 has the same values as firm 2 in all years and firm 3 has the same values as firm 4 in all years too.
I hope this is clear. Any help anyone has is appreciated.
Code:
duplicates tag sales assets year, generate(dups)
If I sort by sales (or assets) and list the data, I can clearly see there are many pairs of firms that have the same values in the same year. I get something like this:
Company ID Sales Assets Year dups
1 6 25 1999 1
2 6 25 1999 1
3 10 100 1999 1
4 10 100 1999 1
1 3.5 45 2000 1
2 3.5 45 2000 1
3 1 50 2000 1
4 1 50 2000 1
To correct for double counting, I'd like to keep one firm and drop the other. I've seen code for create an alternating dummy value each year, which would work in my fictional list I've provided. In my actual (more complicated) data, this would not work (I would end up dropping some years for Company A and some years for Company B, when it would be better to just drop one or the other). That is, I need to create a dummy variable where 0 equals one of the firms with duplicate data and 1 equals the other. Or, in line with the fictional data I have, I want to tell Stata that firm 1 has the same values as firm 2 in all years and firm 3 has the same values as firm 4 in all years too.
I hope this is clear. Any help anyone has is appreciated.
Comment