Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping dupicates

    Hi Statlist,

    I have the following dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(idproduct idfirm) str60 molecule str18 prd long(sales_mnf_qtr_03_2004_lc_1 sales_mnf_qtr_03_2005_lc_1)
    13953  808 "INSULIN ASPART"                           "NOVOLOG"             5716246 11806618
    13953  808 "INSULIN ASPART"                           "NOVOLOG"              378780   462974
    13953  808 "INSULIN ASPART"                           "NOVOLOG"                   0        0
    13953  808 "INSULIN ASPART"                           "NOVOLOG FLEXPEN"      350482   814804
    13953  808 "INSULIN ASPART"                           "NOVOLOG FLEXPEN"           0        0
     9361  650 "INSULIN LISPRO"                           "HUMALOG"            12486040 12802176
     9361  650 "INSULIN LISPRO"                           "HUMALOG"                   0        0
     9361  650 "INSULIN LISPRO"                           "HUMALOG"              140184   218968
     9361  650 "INSULIN LISPRO"                           "HUMALOG"                   0        0
     9361  650 "INSULIN LISPRO"                           "HUMALOG"              105021     2775
     9361  650 "INSULIN LISPRO"                           "HUMALOG KWIKPEN"           0        0
     9361  650 "INSULIN LISPRO"                           "HUMALOG KWIKPEN"           0        0
     9361  650 "INSULIN LISPRO"                           "HUMALOG KWIKPEN"           0        0
     9361  650 "INSULIN LISPRO"                           "HUMALOG PEN"               0        0
     9361  650 "INSULIN LISPRO"                           "HUMALOG PEN"         1024321  1138835
    10085  650 "INSULIN HUMAN BASE"                       "HUMULIN R"           2949776  2646117
    10085  650 "INSULIN HUMAN BASE"                       "HUMULIN R"                 0        0
    10085  650 "INSULIN HUMAN BASE"                       "HUMULIN R"            255795   281910
    10084  808 "INSULIN HUMAN BASE"                       "NOVOLIN R"          12227499 16251168
    10084  808 "INSULIN HUMAN BASE"                       "NOVOLIN R"               334        0
    10084  808 "INSULIN HUMAN BASE"                       "NOVOLIN R"            169321   206652
    10084  808 "INSULIN HUMAN BASE"                       "VELOSULIN HUMAN BR"      225        0
    10084  808 "INSULIN HUMAN BASE"                       "NOVOLIN R PREFILLE"       38        0
    10084  808 "INSULIN HUMAN BASE"                       "VELOSULIN RDNA BR"      8164        0
    10084  808 "INSULIN HUMAN BASE"                       "NOVOLIN R INNOLET"     67812   122810
    10102 1035 "INSULIN HUMAN BASE"                       "AFREZZA"                   0        0
    10102 1035 "INSULIN HUMAN BASE"                       "AFREZZA"                   0        0
    10102 1035 "INSULIN HUMAN BASE"                       "AFREZZA"                   0        0
    10102 1035 "INSULIN HUMAN BASE"                       "AFREZZA"                   0        0
    10083  623 "INSULIN HUMAN BASE"                       "HUMAN INSULIN R"           0        0
     7279  893 "INSULIN HUMAN BASE"                       "EXUBERA"                   0        0
     7279  893 "INSULIN HUMAN BASE"                       "EXUBERA"                   0        0
     7279  893 "INSULIN HUMAN BASE"                       "EXUBERA"                   0        0
     7279  893 "INSULIN HUMAN BASE"                       "EXUBERA"                   0        0
     7279  893 "INSULIN HUMAN BASE"                       "EXUBERA"                   0        0
     1371 1035 "INSULIN GLULISINE"                        "APIDRA"                    0        0
     1371 1035 "INSULIN GLULISINE"                        "APIDRA"                    0        0
     1371 1035 "INSULIN GLULISINE"                        "APIDRA SOLOSTAR"           0        0
     1371 1035 "INSULIN GLULISINE"                        "APIDRA SOLOSTAR"           0        0
     9374  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN N"           3186476  2576408
     9374  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN N"                 0        0
     9374  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN N"                 0       29
     9374  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN N KWIKPEN"         0        0
     9374  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN N PEN"        215488   182256
    10088  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN N"           9745781 11011991
    10088  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN N"              7657      245
    10088  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN N"            175447   178033
    10088  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN N PREFILLE"     9979        0
    10088  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN N INNOLET"     79745   110283
    10083  623 "INSULIN HUMAN BASE"                       "HUMAN INSULIN N"           0        0
     9373  650 "INSULIN HUMAN BASE"                       "HUMULIN 70/30"       2876163  2341280
     9373  650 "INSULIN HUMAN BASE"                       "HUMULIN 70/30"             0        0
     9373  650 "INSULIN HUMAN BASE"                       "HUMULIN 70/30 KWIK"        0        0
     9373  650 "INSULIN HUMAN BASE"                       "HUMULIN 50/50"        133302   121662
     9373  650 "INSULIN HUMAN BASE"                       "HUMULIN 70/30 PEN"    218726   219137
      340  808 "INSULIN HUMAN BASE"                       "NOVOLIN 70/30"       9322483 10960938
      340  808 "INSULIN HUMAN BASE"                       "NOVOLIN 70/30"           793        0
      340  808 "INSULIN HUMAN BASE"                       "NOVOLIN 70/30"        320526   283196
      340  808 "INSULIN HUMAN BASE"                       "NOVOLIN 70/30 PREF"      193        0
      340  808 "INSULIN HUMAN BASE"                       "NOVOLIN 70/30 INNO"   158889   175371
     9373  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN 70/30"       2876163  2341280
     9373  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN 70/30"             0        0
     9373  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN 70/30 KWIK"        0        0
     9373  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN 50/50"        133302   121662
     9373  650 "INSULIN HUMAN ISOPHANE"                   "HUMULIN 70/30 PEN"    218726   219137
      340  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN 70/30"       9322483 10960938
      340  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN 70/30"           793        0
      340  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN 70/30"        320526   283196
      340  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN 70/30 PREF"      193        0
      340  808 "INSULIN HUMAN ISOPHANE"                   "NOVOLIN 70/30 INNO"   158889   175371
    13950  808 "INSULIN ASPART"                           "NOVOLOG FLXPEN MIX"   391156   588174
    13950  808 "INSULIN ASPART"                           "NOVOLOG FLXPEN MIX"        0        0
    13950  808 "INSULIN ASPART"                           "NOVOLOG MIX 70/30"    644404  1854937
    13950  808 "INSULIN ASPART"                           "NOVOLOG MIX 70/30"     92941    74692
    13950  808 "INSULIN ASPART PROTAMINE CRYSTALLINE"     "NOVOLOG FLXPEN MIX"   391156   588174
    13950  808 "INSULIN ASPART PROTAMINE CRYSTALLINE"     "NOVOLOG FLXPEN MIX"        0        0
    13950  808 "INSULIN ASPART PROTAMINE CRYSTALLINE"     "NOVOLOG MIX 70/30"    644404  1854937
    13950  808 "INSULIN ASPART PROTAMINE CRYSTALLINE"     "NOVOLOG MIX 70/30"     92941    74692
     9362  650 "INSULIN LISPRO"                           "HUMALOG KWIKPEN 75"        0        0
     9362  650 "INSULIN LISPRO"                           "HUMALOG KWIKPEN 50"        0        0
     9362  650 "INSULIN LISPRO"                           "HUMALOG MIX 75/25"   3794216  3545529
     9362  650 "INSULIN LISPRO"                           "HUMALOG MIX 50/50"         0        0
     9362  650 "INSULIN LISPRO"                           "HUMALOG PEN MIX 75"        0        0
     9362  650 "INSULIN LISPRO"                           "HUMALOG PEN MIX 75"   728715   682682
     9362  650 "INSULIN LISPRO"                           "HUMALOG PEN MIX 50"        0        0
     9362  650 "INSULIN LISPRO PROTAMINE"                 "HUMALOG KWIKPEN 75"        0        0
     9362  650 "INSULIN LISPRO PROTAMINE"                 "HUMALOG KWIKPEN 50"        0        0
     9362  650 "INSULIN LISPRO PROTAMINE"                 "HUMALOG MIX 75/25"   3794216  3545529
     9362  650 "INSULIN LISPRO PROTAMINE"                 "HUMALOG MIX 50/50"         0        0
     9362  650 "INSULIN LISPRO PROTAMINE"                 "HUMALOG PEN MIX 75"        0        0
     9362  650 "INSULIN LISPRO PROTAMINE"                 "HUMALOG PEN MIX 75"   728715   682682
     9362  650 "INSULIN LISPRO PROTAMINE"                 "HUMALOG PEN MIX 50"        0        0
     9372  650 "INSULIN HUMAN ZINC SUSPENSION (COMPOUND)" "HUMULIN L"            514437   443044
    10087  808 "INSULIN HUMAN ZINC SUSPENSION (COMPOUND)" "NOVOLIN L"              7880     3140
    10819 1035 "INSULIN GLARGINE"                         "LANTUS"             22466987 34505845
    10819 1035 "INSULIN GLARGINE"                         "LANTUS"                    0        0
    10819 1035 "INSULIN GLARGINE"                         "LANTUS"                    0   208184
    10819 1035 "INSULIN GLARGINE"                         "LANTUS SOLOSTAR"           0        0
    10819 1035 "INSULIN GLARGINE"                         "LANTUS SOLOSTAR"           0        0
    10819 1035 "INSULIN GLARGINE"                         "TOUJEO SOLOSTAR"           0        0
    end

    What I am trying to do is to check if there are observations (rows) that are identical one to the other and drop them, since I have noticed that the same sales are repeated for all molecules. So for instance if I have:

    Code:
     13953 808 "INSULIN ASPART"                           "NOVOLOG"             5716246 11806618
     13953 808 "INSULIN ASPART"                           "NOVOLOG"             5716246 11806618
    I would like to delete one of the two.

    Now I have seen the suggestion of using con function to do it but I don't think it will work in my case since I have a panel structure.
    So I applied the following code:
    Code:
    duplicates drop idprod sales*,force
    Do you think it does the job correctly?

    Thank you,

    Federico
    Last edited by Federico Nutarelli; 26 Jan 2019, 03:05.

  • #2
    Federico:
    you may want to try something along tyhe following lines:
    Code:
    . egen flag=group( idproduct sales_mnf_qtr_03_2004_lc_1 sales_mnf_qtr_03_2005_lc_1)
    
    . bysort flag: keep if _n==1
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Thank you very much Professor Lazzaro, I will try it for sure. Anyway also in order to learn the "duplicates command" properly I would like to ask if
      Code:
       
       duplicates drop idprod sales*,force
      does the same job as your code.

      Thank you,

      Federico

      Comment


      • #4
        Federico:
        yes, it does.
        PS: please call me Carlo, as all on (and many more off) the list do. Thanks.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Ok it will be a pleasure and thank you again Carlo!

          Comment

          Working...
          X