Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tag duplicates based on a reference year

    Hi!

    I aim to identify duplicate observations using a specified reference year and seek assistance with the coding. I appreciate your help in advance.
    The dataset covers the period from 1995 to 2021 and comprises variables such as year, productid, firm, and profit.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(year firmid) long productid float profit double firmname float base(1995)
    1995 24  30379     1.374 24 1
    1995 31 121190     3.775 31 1
    1995 32  20230    163.08 32 1
    1995 32  30339   533.095 32 1
    1995 32  30375    23.922 32 1
    1995 32  30378    42.707 32 1
    1995 32  30379  7194.314 32 1
    1995 32  30420   903.288 32 1
    1995 32  30490  6294.968 32 1
    1995 32  30613   133.337 32 1
    1995 32  30749 25756.193 32 1
    1995 32  30799   4760.38 32 1
    1995 32  50790    34.526 32 1
    1995 32  51000   266.258 32 1
    1995 32  51199   534.187 32 1
    1995 32  70320    23.989 32 1
    1995 32 100590  2749.285 32 1
    1995 32 121190    40.637 32 1
    1995 32 121220    31.927 32 1
    1995 32 130219    37.122 32 1
    1995 32 150710  5657.403 32 1
    1995 32 151219    335.85 32 1
    1995 32 151529    17.505 32 1
    1995 32 160300     9.631 32 1
    1995 32 160420  6584.757 32 1
    1995 32 170490   124.313 32 1
    1995 32 180690    116.96 32 1
    1995 32 190530    22.167 32 1
    1995 32 200710    78.932 32 1
    1995 32 200799    66.266 32 1
    1995 32 210410   162.828 32 1
    1995 32 210690       150 32 1
    1995 32 230230  1018.736 32 1
    1995 32 230400  20017.82 32 1
    1995 32 230610   747.192 32 1
    1995 32 230630  1926.334 32 1
    1995 32 230990    62.717 32 1
    1995 36  10111   610.556 36 1
    1995 36  10119  1419.384 36 1
    1995 36  10600  2229.082 36 1
    1995 36  20120   185.975 36 1
    1995 36  20130   746.141 36 1
    1995 36  20210  48430.03 36 1
    1995 36  20220 112788.95 36 1
    1995 36  20230  35259.31 36 1
    1995 36  20322     1.318 36 1
    1995 36  20329     38.72 36 1
    1995 36  20410     1.217 36 1
    1995 36  20421   442.566 36 1
    1995 36  20422   308.764 36 1
    end
    My initial task involves identifying duplicated observations by tagging them according to the productid and firm, with the reference year set as 1995.
    For instance, I aim to identify duplicate observations in 1996 and 1995, 1997 and 1995, 1998 and 1995..., based on firm and productid.

    The subsequent step is to sum profit of the duplicated and non-duplicated observations by year. I would appreciate any suggestions or coding ideas, particularly for the first step. Thank you.

  • #2
    Try something like this:

    Code:
    forvalues y = 1995(1)1998{
        duplicates tag productid firmid if year == 1995 | year == `y' , gen(dups_`y') 
    }

    Comment


    • #3
      Originally posted by Luca Calianno View Post
      Try something like this:

      Code:
      forvalues y = 1995(1)1998{
      duplicates tag productid firmid if year == 1995 | year == `y' , gen(dups_`y')
      }
      the codes work fine!

      thank you.

      Comment

      Working...
      X