Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with merging-- but no duplicates in using dataset

    Hi everyone,

    I'm trying to do a m:1 merge, but I get the error message
    "variable origin_stationcode does not uniquely identify observations in the using data
    r(459);"
    I have double checked my using dataset to ensure there are no duplicates. Here is my code:

    Code:
    ** cleaning duplicates out
        duplicates list, nolabel sepby(stn_code)
        duplicates tag stn_code, gen (dup_stn_code) // 3 copies, looks fine
        duplicates drop
        duplicates report // all good now
        
    
    ** gen 2 stn_code vars for origin and dest stationcodes for merge 
        gen origin_stationcode = stn_code
        gen dest_stationcode = stn_code
    
    ** dropping missing stn_code (missing MRT/LRT from excel, to ensure no dups)    
    drop if stn_code == .
    I then saved as a new dataset, and used the following merging command:

    Code:
    merge m:1 origin_stationcode <using dataset path and name here>
    I tried with and without the
    Code:
    , force
    option

    Help please!

    This is my using dataset; it should be a unique list of public transport stations that I extracted from analysis in QGIS:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 stn_type str32 stn_desc long stn_code float origin_stationcode
    "MRT" "YIO CHU KANG MRT STATION"        1    1
    "MRT" "ANG MO KIO MRT STATION"          2    2
    "MRT" "BRADDELL MRT STATION"            4    4
    "MRT" "TOA PAYOH MRT STATION"           5    5
    "MRT" "NOVENA MRT STATION"              6    6
    "MRT" "NEWTON MRT STATION"              7    7
    "MRT" "DHOBY GHAUT MRT STATION"        10   10
    "MRT" "TIONG BAHRU MRT STATION"        16   16
    "MRT" "REDHILL MRT STATION"            17   17
    "MRT" "QUEENSTOWN MRT STATION"         18   18
    "MRT" "COMMONWEALTH MRT STATION"       19   19
    "MRT" "CLEMENTI MRT STATION"           21   21
    "MRT" "YISHUN MRT STATION"             22   22
    "MRT" "KHATIB MRT STATION"             23   23
    "MRT" "LAKESIDE MRT STATION"           26   26
    "MRT" "BUKIT GOMBAK MRT STATION"       29   29
    "MRT" "CHOA CHU KANG MRT STATION"      30   30
    "MRT" "BUGIS MRT STATION"              31   31
    "MRT" "ALJUNIED MRT STATION"           34   34
    "MRT" "EUNOS MRT STATION"              36   36
    "MRT" "KEMBANGAN MRT STATION"          37   37
    "MRT" "SIMEI MRT STATION"              40   40
    "MRT" "PASIR RIS MRT STATION"          42   42
    "MRT" "SEMBAWANG MRT STATION"          43   43
    "MRT" "ADMIRALTY MRT STATION"          44   44
    "MRT" "WOODLANDS MRT STATION"          45   45
    "MRT" "MARSILING MRT STATION"          46   46
    "MRT" "YEW TEE MRT STATION"            48   48
    "LRT" "CHOA CHU KANG LRT STATION"      49   49
    "LRT" "SOUTH VIEW LRT STATION"         50   50
    "LRT" "KEAT HONG LRT STATION"          51   51
    "LRT" "BUKIT PANJANG LRT DEPOT"        54   54
    "LRT" "PETIR LRT STATION"              55   55
    "LRT" "PENDING LRT STATION"            56   56
    "LRT" "BANGKIT LRT STATION"            57   57
    "LRT" "FAJAR LRT STATION"              58   58
    "LRT" "SEGAR LRT STATION"              59   59
    "LRT" "JELAPANG LRT STATION"           60   60
    "LRT" "SENJA LRT STATION"              61   61
    "MRT" "PIONEER MRT STATION"            66   66
    "MRT" "PUNGGOL MRT STATION"           101  101
    "MRT" "SENGKANG MRT STATION"          102  102
    "MRT" "BUANGKOK MRT STATION"          103  103
    "MRT" "HOUGANG MRT STATION"           104  104
    "MRT" "SERANGOON MRT STATION"         106  106
    "MRT" "POTONG PASIR MRT STATION"      108  108
    "MRT" "BOON KENG MRT STATION"         109  109
    "MRT" "FARRER PARK MRT STATION"       110  110
    "MRT" "LITTLE INDIA MRT STATION"      111  111
    "MRT" "DHOBY GHAUT MRT STATION"       112  112
    "MRT" "OUTRAM PARK MRT STATION"       115  115
    "LRT" "COMPASSVALE LRT STATION"       117  117
    "LRT" "RUMBIA LRT STATION"            118  118
    "LRT" "BAKAU LRT STATION"             119  119
    "LRT" "KANGKAR LRT STATION"           120  120
    "LRT" "RANGGUNG LRT STATION"          121  121
    "LRT" "CHENG LIM LRT STATION"         122  122
    "LRT" "FARMWAY LRT STATION"           123  123
    "LRT" "KUPANG LRT STATION"            124  124
    "LRT" "FERNVALE LRT STATION"          126  126
    "LRT" "LAYAR LRT STATION"             127  127
    "LRT" "TONGKANG LRT STATION"          128  128
    "LRT" "RENJONG LRT STATION"           129  129
    "LRT" "MERIDIAN LRT STATION"          131  131
    "LRT" "CORAL EDGE LRT STATION"        132  132
    "LRT" "RIVIERA LRT STATION"           133  133
    "LRT" "KADALOOR LRT STATION"          134  134
    "LRT" "DAMAI LRT STATION"             136  136
    "LRT" "SAM KEE LRT STATION"           137  137
    "LRT" "NIBONG LRT STATION"            141  141
    "LRT" "SUMANG LRT STATION"            142  142
    "LRT" "SOO TECK LRT STATION"          143  143
    "MRT" "SERANGOON MRT STATION"         202  202
    "MRT" "LORONG CHUAN MRT STATION"      203  203
    "MRT" "DAKOTA MRT STATION"            209  209
    "MRT" "CALDECOTT MRT STATION"         216  216
    "MRT" "FARRER ROAD MRT STATION"       218  218
    "MRT" "ONE-NORTH MRT STATION"         221  221
    "MRT" "BUGIS MRT STATION"             301  301
    "MRT" "BUKIT PANJANG MRT STATION"     306  306
    "MRT" "CASHEW MRT STATION"            307  307
    "MRT" "HILLVIEW MRT STATION"          308  308
    "MRT" "BEAUTY WORLD MRT STATION"      309  309
    "MRT" "KING ALBERT PARK MRT STATION"  310  310
    "MRT" "TAN KAH KEE MRT STATION"       312  312
    "MRT" "STEVENS MRT STATION"           314  314
    "MRT" "NEWTON MRT STATION"            315  315
    "MRT" "LITTLE INDIA MRT STATION"      316  316
    "MRT" "ROCHOR MRT STATION"            317  317
    "MRT" "JALAN BESAR MRT STATION"       320  320
    "MRT" "UBI MRT STATION"               325  325
    "MRT" "KAKI BUKIT MRT STATION"        326  326
    "MRT" "BEDOK NORTH MRT STATION"       327  327
    "MRT" "BEDOK RESERVOIR MRT STATION"   328  328
    "MRT" "TAMPINES WEST MRT STATION"     329  329
    "MRT" "TAMPINES MRT STATION"          330  330
    "BUS" "BUGIS STN EXIT A"             1059 1059
    "BUS" "QUEEN ST TER"                 1109 1109
    "BUS" "OPP BUGIS STN EXIT C"         1112 1112
    "BUS" "BUGIS STN EXIT B"             1113 1113
    end
    And this is my master dataset; it contains passengers (some observed multiple times):

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id double entry_datetime float origin_stationcode str40 origin_stationdesc
      1 1852813083000  46 "Marsiling"      
      2 1883675373000   8 "Orchard NSEW"    
      3 1885734629000  46 "Marsiling"      
      4 1884236792000  43 "Sembawang"      
      5 1853134375000  43 "Sembawang"      
      6 1851843392000  48 "Yew Tee"        
      7 1882523150000 119 "Bakau"          
      8 1882429804000  46 "Marsiling"      
      9 1853169379000 222 "Kent Ridge"      
     10 1851925582000 226 "Telok Blangah"  
     11 1883914848000  45 "Woodlands NSEW"  
     12 1884497225000 208 "Paya Lebar CCL"  
     13 1854043083000  12 "Raffles Place"  
     14 1850754582000  31 "Bugis NSEW"      
     15 1882789793000   6 "Novena"          
     16 1883288064000   3 "Bishan NSEW"    
     17 1883982219000   4 "Braddell"        
     18 1851334826000  30 "Choa Chu Kang"  
     19 1882393712000  27 "Boon Lay"        
     20 1854343697000  23 "Khatib"          
     21 1883197050000  46 "Marsiling"      
     22 1852701780000  47 "Kranji"          
     23 1882259234000 105 "Kovan"          
     24 1885537237000 202 "Serangoon CCL"  
     25 1853049310000  43 "Sembawang"      
     26 1851092353000   3 "Bishan NSEW"    
     27 1885535563000  22 "Yishun"          
     28 1853573965000   4 "Braddell"        
     29 1883563169000  45 "Woodlands NSEW"  
     30 1853434045000  66 "Pioneer"        
     31 1883888211000  67 ""                
     32 1852223559000   7 "Newton NSEW"    
     33 1884374511000  11 "City Hall"      
     34 1852141891000   6 "Novena"          
     35 1852793990000  68 ""                
     36 1853041360000  43 "Sembawang"      
     37 1884974157000  36 "Eunos"          
     38 1883989392000  44 "Admiralty"      
     39 1851879277000  12 "Raffles Place"  
     40 1884799068000   6 "Novena"          
     41 1885827597000 208 "Paya Lebar CCL"  
     42 1882371683000 116 "HarbourFront NEL"
     43 1853393904000   2 "Ang Mo Kio"      
     44 1885059887000 208 "Paya Lebar CCL"  
     45 1851145808000  44 "Admiralty"      
     46 1851427017000   8 "Orchard NSEW"    
     47 1882594217000  44 "Admiralty"      
     48 1851919690000  16 "Tiong Bahru"    
     49 1851611114000  24 "Jurong East"    
     50 1853874690000  23 "Khatib"          
     51 1884101150000  23 "Khatib"          
     52 1853747111000   5 "Toa Payoh"      
     53 1882636347000  22 "Yishun"          
     54 1851014747000   5 "Toa Payoh"      
     55 1885658749000  23 "Khatib"          
     56 1884001180000   2 "Ang Mo Kio"      
     57 1851019177000  12 "Raffles Place"  
     58 1882098761000  52 "Teck Whye"      
     59 1884620239000   6 "Novena"          
     60 1883720791000  24 "Jurong East"    
     61 1854386646000 207 "MacPherson CCL"  
     62 1882681065000   2 "Ang Mo Kio"      
     63 1851921679000  26 "Lakeside"        
     64 1883158289000 206 "Tai Seng"        
     65 1852057224000 333 "Expo DTL"        
     66 1882899308000 204 "Bishan CCL"      
     67 1854299436000 310 "King Albert Park"
     68 1851835218000  24 "Jurong East"    
     69 1853859176000   2 "Ang Mo Kio"      
     70 1884984468000   8 "Orchard NSEW"    
     71 1885010703000  30 "Choa Chu Kang"  
     72 1884881850000  43 "Sembawang"      
     73 1884032425000  29 "Bukit Gombak"    
     74 1854254096000  51 "Keat Hong"      
     75 1882307807000  45 "Woodlands NSEW"  
     76 1883116686000  44 "Admiralty"      
     77 1885028023000  44 "Admiralty"      
     78 1884930265000   2 "Ang Mo Kio"      
     79 1884846648000  45 "Woodlands NSEW"  
     80 1852666392000 114 "Chinatown NEL"  
     81 1853328619000   8 "Orchard NSEW"    
     82 1882103791000   6 "Novena"          
     83 1885532456000  26 "Lakeside"        
     84 1885724142000 204 "Bishan CCL"      
     85 1851064178000  46 "Marsiling"      
     86 1852306371000   5 "Toa Payoh"      
     87 1883058924000   4 "Braddell"        
     88 1852841076000   6 "Novena"          
     89 1850752361000 112 "Dhoby Ghaut NEL"
     90 1852132694000   8 "Orchard NSEW"    
     91 1884024765000   6 "Novena"          
     92 1851668383000  43 "Sembawang"      
     93 1883146018000  11 "City Hall"      
     94 1851758242000  22 "Yishun"          
     95 1882167869000  53 "Phoenix"        
     96 1851323194000 204 "Bishan CCL"      
     97 1850714406000 204 "Bishan CCL"      
     98 1853532079000  47 "Kranji"          
     99 1854343668000  43 "Sembawang"      
    100 1851145683000 209 "Dakota"          
    end
    format %tcDay_Mon_DD_CCYY_HH:MM:SS entry_datetime
    Last edited by Mich Pan; 04 Jul 2023, 21:16.

  • #2
    When you are dropping duplicates, you probably want to do

    Code:
    drop if dup_stn_code
    instead of duplicates drop, which will only drop observations which are entirely duplicates (duplicates on all variables), instead of duplicates only on stn_code.
    Last edited by Hemanshu Kumar; 05 Jul 2023, 03:16.

    Comment


    • #3
      Hi Hemanshu, wouldn't
      Code:
      drop if dup_stn_code
      drop not just the copy, but also the copied-from observation? for example,

      Duplicates in terms of all variables

      +-------------------------------------------------------------+
      | group: obs: stn_type stn_desc stn_code |
      |-------------------------------------------------------------|
      | 1 750 BUS BLK 644 43709 |
      | 1 1435 BUS BLK 644 43709 |
      |-------------------------------------------------------------|
      | 2 1273 BUS CHENG LIM STN EXIT B 67421 |
      | 2 2105 BUS CHENG LIM STN EXIT B 67421 |
      |-------------------------------------------------------------|
      | 3 87 MRT CALDECOTT MRT STATION 216 |
      | 3 97 MRT CALDECOTT MRT STATION 216 |
      +-------------------------------------------------------------+


      Code:
      .         drop if dup_stn_code 
      (10 observations deleted)
      
      . browse if stn_code == 67421
      yields an empty data browser-- even my original data entry is deleted. I want to keep that.

      Comment


      • #4
        Yes, mea culpa. You could do
        Code:
        duplicates drop stn_code, force
        or you could do

        Code:
        bysort stn_code: drop if dup_stn_code & _n != 1

        Comment


        • #5
          No sweat, thanks Hemanshu.

          Code:
           
           bysort stn_code: drop if dup_stn_code & _n != 1
          worked for me-- it dropped 5 observations, while

          Code:
          duplicates drop
          only dropped 3 observations, and the subsequent
          Code:
          duplicates report
          indicated that there were no more duplicates.


          Strange, but in other words-- it worked! Ha! I can now successfully merge the datasets together. Thank you!

          Comment

          Working...
          X