Hi!
I aim to identify duplicate observations using a specified reference year and seek assistance with the coding. I appreciate your help in advance.
The dataset covers the period from 1995 to 2021 and comprises variables such as year, productid, firm, and profit.
My initial task involves identifying duplicated observations by tagging them according to the productid and firm, with the reference year set as 1995.
For instance, I aim to identify duplicate observations in 1996 and 1995, 1997 and 1995, 1998 and 1995..., based on firm and productid.
The subsequent step is to sum profit of the duplicated and non-duplicated observations by year. I would appreciate any suggestions or coding ideas, particularly for the first step. Thank you.
I aim to identify duplicate observations using a specified reference year and seek assistance with the coding. I appreciate your help in advance.
The dataset covers the period from 1995 to 2021 and comprises variables such as year, productid, firm, and profit.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int(year firmid) long productid float profit double firmname float base(1995) 1995 24 30379 1.374 24 1 1995 31 121190 3.775 31 1 1995 32 20230 163.08 32 1 1995 32 30339 533.095 32 1 1995 32 30375 23.922 32 1 1995 32 30378 42.707 32 1 1995 32 30379 7194.314 32 1 1995 32 30420 903.288 32 1 1995 32 30490 6294.968 32 1 1995 32 30613 133.337 32 1 1995 32 30749 25756.193 32 1 1995 32 30799 4760.38 32 1 1995 32 50790 34.526 32 1 1995 32 51000 266.258 32 1 1995 32 51199 534.187 32 1 1995 32 70320 23.989 32 1 1995 32 100590 2749.285 32 1 1995 32 121190 40.637 32 1 1995 32 121220 31.927 32 1 1995 32 130219 37.122 32 1 1995 32 150710 5657.403 32 1 1995 32 151219 335.85 32 1 1995 32 151529 17.505 32 1 1995 32 160300 9.631 32 1 1995 32 160420 6584.757 32 1 1995 32 170490 124.313 32 1 1995 32 180690 116.96 32 1 1995 32 190530 22.167 32 1 1995 32 200710 78.932 32 1 1995 32 200799 66.266 32 1 1995 32 210410 162.828 32 1 1995 32 210690 150 32 1 1995 32 230230 1018.736 32 1 1995 32 230400 20017.82 32 1 1995 32 230610 747.192 32 1 1995 32 230630 1926.334 32 1 1995 32 230990 62.717 32 1 1995 36 10111 610.556 36 1 1995 36 10119 1419.384 36 1 1995 36 10600 2229.082 36 1 1995 36 20120 185.975 36 1 1995 36 20130 746.141 36 1 1995 36 20210 48430.03 36 1 1995 36 20220 112788.95 36 1 1995 36 20230 35259.31 36 1 1995 36 20322 1.318 36 1 1995 36 20329 38.72 36 1 1995 36 20410 1.217 36 1 1995 36 20421 442.566 36 1 1995 36 20422 308.764 36 1 end
For instance, I aim to identify duplicate observations in 1996 and 1995, 1997 and 1995, 1998 and 1995..., based on firm and productid.
The subsequent step is to sum profit of the duplicated and non-duplicated observations by year. I would appreciate any suggestions or coding ideas, particularly for the first step. Thank you.
Comment