In my data (example from other post) I have multiple entries for the same population ("entity") but also for different years ("year") with different values in other variables.
1. With "egen tag" and "egen ndistinct" I can find out how many different "firms" each combination of "entity" and "year" (group?) has.
( I will have to do this not only for "firm" but also for "value" and other variables).
2. Now I would like to know the frequencies for each value in "ndistinct" over all the combinations of "entity" and "year".
(To get an idea of the variation within different variables and think about how to merge them in the most sensible way).
3. Last I need to make sure there is only one observation per combinations of "entity" and "year" and therefore combine (merge, collapse, append...?) the values of other variables. If there is a most frequent value, choose that one, if there is not, choose the highest or first, or in some other cases the mean or sum (to be defined for each variable).
I really struggle to find a way how to do that. I hope my example is clear.
Original data
Code:
1. Question
Code:
2. Question -> This is the table I would like to obtain
Code:
3. Question?
1. With "egen tag" and "egen ndistinct" I can find out how many different "firms" each combination of "entity" and "year" (group?) has.
( I will have to do this not only for "firm" but also for "value" and other variables).
2. Now I would like to know the frequencies for each value in "ndistinct" over all the combinations of "entity" and "year".
(To get an idea of the variation within different variables and think about how to merge them in the most sensible way).
3. Last I need to make sure there is only one observation per combinations of "entity" and "year" and therefore combine (merge, collapse, append...?) the values of other variables. If there is a most frequent value, choose that one, if there is not, choose the highest or first, or in some other cases the mean or sum (to be defined for each variable).
I really struggle to find a way how to do that. I hope my example is clear.
Original data
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte entity int year str1 firm float value 1 2010 "A" 15 1 2010 "A" 8 1 2010 "B" 12 1 2011 "B" 25 1 2012 "B" 8 2 2010 "A" 7 2 2011 "A" 5 2 2011 "A" 12 2 2011 "C" 13 2 2012 "A" 19 2 2012 "B" 25 2 2011 "B" 14 2 2012 "C" 18 2 2012 "D" 16 end sort entity year list, sepby(entity year)
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte entity int year str1 firm float value 1 2010 "A" 15 1 2010 "A" 8 1 2010 "B" 12 1 2011 "B" 25 1 2012 "B" 8 2 2010 "A" 7 2 2011 "A" 5 2 2011 "A" 12 2 2011 "C" 13 2 2012 "A" 19 2 2012 "B" 25 2 2011 "B" 14 2 2012 "C" 18 2 2012 "D" 16 end egen tag = tag(firm entity year) egen ndistinct = total(tag), by(entity year) sort entity year list, sepby(entity year)
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte ndistinct freq 1 3 2 1 3 1 4 1 end sort ndistinct freq list, sepby(ndistinct freq)
Comment