Hi, my question is a slight twist on counting distinct occurrences in a group.
Here is some example code:
An observation in my dataset consists of a firmid-year-j-product combination.
I want to calculate the number of distinct products each firm sold to a non-USA country in 2003. This variable, numprod_nonUS_2003, should be added as a new column. So even observations not corresponding to 2003 or to the USA should have the value of the corresponding firm (meaning a collapse won't actually work).
Note also that if a firm doesn't sell, it is still recorded in the dataset with value = 0, so we need to count only observations with value > 0.
I managed to do this if I conditioned instead on USA sales, which is easier, since a firmid-year-product combination is unique if you fix the country. I did it by running:
(2003 is the second year in my data)
I don't know however how to adapt this to the non-USA case.
Here is some example code:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long firmid str6 product str4 j float year double value 5221303 "121190" "DNK" 2003 0 5221303 "40900" "USA" 2003 191556.5 5221303 "71029" "ESP" 2003 0 5221303 "71080" "DEU" 2003 80694.19 5221303 "71080" "ESP" 2003 66672.8 5221303 "71080" "FRA" 2003 386993.80 5221303 "71080" "ITA" 2003 0 5221303 "71151" "DEU" 2003 125894.75 5221303 "71151" "ESP" 2003 119350 5221303 "71159" "DEU" 2003 0 5221303 "71159" "ESP" 2003 0 5221303 "71159" "FIN" 2003 0 5221303 "71159" "SWE" 2003 0 5221303 "71190" "DEU" 2003 0 5221303 "71190" "ESP" 2003 0 5221303 "71190" "FRA" 2003 0 5221303 "71230" "ESP" 2003 0 5221303 "71230" "FRA" 2003 0 5221303 "71239" "ESP" 2003 35667.8 5221303 "71239" "FRA" 2003 49720.72 5221303 "80240" "FRA" 2003 0 5221303 "80240" "ITA" 2003 204699.9 5221303 "81120" "AUS" 2003 125420.1 5221303 "81120" "BEL" 2003 0 5221303 "81120" "CAN" 2003 25200 5221303 "81120" "CHE" 2003 0 5221303 "81120" "DEU" 2003 19493 5221303 "81120" "DNK" 2003 0 5221303 "81120" "FRA" 2003 719039.84 5221303 "81120" "GBR" 2003 26525 5221303 "81120" "ITA" 2003 148453.38 5221303 "81120" "NLD" 2003 92302.6 5221303 "81120" "NZL" 2003 19426 5221303 "81120" "USA" 2003 74740.14 5221303 "81190" "DEU" 2003 15364.5 5221303 "81190" "FRA" 2003 0 end
An observation in my dataset consists of a firmid-year-j-product combination.
I want to calculate the number of distinct products each firm sold to a non-USA country in 2003. This variable, numprod_nonUS_2003, should be added as a new column. So even observations not corresponding to 2003 or to the USA should have the value of the corresponding firm (meaning a collapse won't actually work).
Note also that if a firm doesn't sell, it is still recorded in the dataset with value = 0, so we need to count only observations with value > 0.
I managed to do this if I conditioned instead on USA sales, which is easier, since a firmid-year-product combination is unique if you fix the country. I did it by running:
(2003 is the second year in my data)
Code:
bysort firmid: egen numprod_US_2003 = total(j == "USA" & year == 2003 & value > 0) bysort firmid product j (year): replace numprod_US_2003 = numprod_US_2003[2]
Comment