Hi everyone,
I recently posted a post about how to compute a variable in minutes, in which Clyde Schechter gives me beautiful advices. He contributes to improve also my understanding of the -merge- command, and the mortal sin -merge m:m-. Thank you so much Professor Schechter.
I corrected my code as recommended by him:
But I still encounter problems, and I do not know how to proceed. My master dataset does not contain any duplicated variables or values:
However, when I do the -merge- process between my masters and using dataset, I get a lot of duplicated values.
Here's the code I used:
Finally, here are my master dataset:
And here my "using" dataset:
The merged result is:
I now have 257,138 observations, instead of my initial 222,744.
Could anyone give me a solution please?
I don't know if this is useful, but my master dataset aggregates each individual's journey into a variable.
For example, instead of detailing each stage for each journey, the master dataset aggregates the number of stages per journey into the variable N_ETAPAS_POR_VIAJE.
On the other hand, the using dataset details each stage of each trip for each individual. For example, an observation for stage 1 of trip 1, an observation for stage 2 of trip 1, etc. It is represented by the variable ID_ETAPA.
Thank you very much for the help provided.
Best,
Michael
I recently posted a post about how to compute a variable in minutes, in which Clyde Schechter gives me beautiful advices. He contributes to improve also my understanding of the -merge- command, and the mortal sin -merge m:m-. Thank you so much Professor Schechter.
I corrected my code as recommended by him:
Originally posted by Clyde Schechter
View Post
Code:
cd ${maindir}/04_pt_modes_cleaned/data/clean_edm use clean_edm_trips_costs_pt_taxis_privatefuelcosts_together.dta, clear . duplicates drop Duplicates in terms of all variables (0 observations are duplicates)
Here's the code I used:
Code:
*2. Merge with master data clean_edm_trips_costs_pt_taxis_privatefuelcosts_togetherd.dta cd ${maindir}/04_pt_modes_cleaned/data/clean_edm use clean_edm_trips_costs_pt_taxis_privatefuelcosts_togetherd.dta, clear // merge the EESTACIONA variable. EESTACIONA tells us where people have parked. merge 1:m ID_HOGAR ID_IND ID_VIAJE using ${data}/Microdata/etapas_edm_2018_parking_data.dta, keepusing(EESTACIONA workplace_parking free_parking ser_tariffs public_tariffs)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str9 individ_ID long ID_HOGAR str12 viaje_ID float date byte N_ETAPAS_POR_VIAJE "1734079_3" 1734079 "1734079_3_1" 21223 3 "1734079_2" 1734079 "1734079_2_2" 21223 1 "1734079_4" 1734079 "1734079_4_2" 21223 1 "1734079_1" 1734079 "1734079_1_2" 21223 1 "1734079_3" 1734079 "1734079_3_2" 21223 3 "1734079_2" 1734079 "1734079_2_1" 21223 1 "1734079_4" 1734079 "1734079_4_1" 21223 1 "1734079_1" 1734079 "1734079_1_1" 21223 1 "2463005_1" 2463005 "2463005_1_1" 21227 1 "1953550_1" 1953550 "1953550_1_8" 21227 1 "245520_3" 245520 "245520_3_5" 21227 1 "522412_1" 522412 "522412_1_1" 21227 1 "1493515_2" 1493515 "1493515_2_2" 21227 5 "1777111_4" 1777111 "1777111_4_3" 21227 1 "1107787_1" 1107787 "1107787_1_2" 21227 1 "2010611_3" 2010611 "2010611_3_2" 21227 1 "1093214_2" 1093214 "1093214_2_2" 21227 1 "1093214_2" 1093214 "1093214_2_4" 21227 1 "506289_2" 506289 "506289_2_1" 21227 1 "1687815_3" 1687815 "1687815_3_1" 21227 1 "527805_3" 527805 "527805_3_2" 21227 3 "1957996_1" 1957996 "1957996_1_5" 21227 1 "1953550_1" 1953550 "1953550_1_1" 21227 1 "1953550_2" 1953550 "1953550_2_1" 21227 1 "1575546_1" 1575546 "1575546_1_3" 21227 1 "548892_1" 548892 "548892_1_3" 21227 1 "1777111_4" 1777111 "1777111_4_2" 21227 1 "2428229_1" 2428229 "2428229_1_7" 21227 1 "2009099_1" 2009099 "2009099_1_3" 21227 1 "1668877_2" 1668877 "1668877_2_2" 21227 2 "1318545_2" 1318545 "1318545_2_4" 21227 1 "1874598_1" 1874598 "1874598_1_1" 21227 1 "1874598_2" 1874598 "1874598_2_1" 21227 1 "1953550_1" 1953550 "1953550_1_5" 21227 1 "1576271_3" 1576271 "1576271_3_3" 21227 1 "556947_3" 556947 "556947_3_4" 21227 1 "2009099_2" 2009099 "2009099_2_1" 21227 1 "527805_1" 527805 "527805_1_1" 21227 1 "2024382_2" 2024382 "2024382_2_2" 21227 1 "2461844_3" 2461844 "2461844_3_2" 21227 1 "546676_4" 546676 "546676_4_1" 21227 1 "653680_4" 653680 "653680_4_2" 21227 1 "1732362_2" 1732362 "1732362_2_3" 21227 2 "2009350_2" 2009350 "2009350_2_1" 21227 1 "2023802_1" 2023802 "2023802_1_1" 21227 2 "491803_1" 491803 "491803_1_1" 21227 1 "546676_2" 546676 "546676_2_2" 21227 1 "2010146_2" 2010146 "2010146_2_5" 21227 1 "2010146_2" 2010146 "2010146_2_1" 21227 1 "556947_3" 556947 "556947_3_1" 21227 1 "270583_2" 270583 "270583_2_1" 21227 1 "653680_4" 653680 "653680_4_1" 21227 1 "2010146_5" 2010146 "2010146_5_2" 21227 1 "2173351_3" 2173351 "2173351_3_2" 21227 1 "2023802_2" 2023802 "2023802_2_1" 21227 1 "1957996_2" 1957996 "1957996_2_3" 21227 1 "1318545_2" 1318545 "1318545_2_1" 21227 1 "1397823_1" 1397823 "1397823_1_1" 21227 1 "2010146_4" 2010146 "2010146_4_2" 21227 1 "1093214_2" 1093214 "1093214_2_1" 21227 1 "30408_1" 30408 "30408_1_1" 21227 4 "2461418_2" 2461418 "2461418_2_3" 21227 1 "245520_3" 245520 "245520_3_1" 21227 1 "246831_2" 246831 "246831_2_1" 21227 4 "245520_1" 245520 "245520_1_3" 21227 1 "1557100_1" 1557100 "1557100_1_3" 21227 1 "2009099_2" 2009099 "2009099_2_4" 21227 1 "2023802_3" 2023802 "2023802_3_2" 21227 1 "1107787_2" 1107787 "1107787_2_2" 21227 2 "1576271_3" 1576271 "1576271_3_1" 21227 1 "1387349_4" 1387349 "1387349_4_2" 21227 3 "1953550_2" 1953550 "1953550_2_3" 21227 1 "2462103_1" 2462103 "2462103_1_3" 21227 2 "2009099_3" 2009099 "2009099_3_1" 21227 2 "246831_3" 246831 "246831_3_1" 21227 1 "721775_1" 721775 "721775_1_2" 21227 1 "1588143_2" 1588143 "1588143_2_1" 21227 1 "1614160_2" 1614160 "1614160_2_1" 21227 1 "1668877_3" 1668877 "1668877_3_5" 21227 1 "547785_1" 547785 "547785_1_1" 21227 1 "2010146_4" 2010146 "2010146_4_1" 21227 1 "2428229_1" 2428229 "2428229_1_1" 21227 1 "1687563_2" 1687563 "1687563_2_1" 21227 1 "522412_1" 522412 "522412_1_2" 21227 1 "68123_1" 68123 "68123_1_2" 21227 1 "1162081_2" 1162081 "1162081_2_1" 21227 1 "1777111_1" 1777111 "1777111_1_1" 21227 1 "175062_2" 175062 "175062_2_1" 21227 1 "1162081_3" 1162081 "1162081_3_2" 21227 2 "246831_3" 246831 "246831_3_4" 21227 3 "2009099_1" 2009099 "2009099_1_2" 21227 1 "246831_1" 246831 "246831_1_7" 21227 1 "1794675_2" 1794675 "1794675_2_2" 21227 1 "1499476_1" 1499476 "1499476_1_2" 21227 1 "721775_3" 721775 "721775_3_2" 21227 1 "1176130_3" 1176130 "1176130_3_1" 21227 1 "721775_1" 721775 "721775_1_1" 21227 1 "1397769_2" 1397769 "1397769_2_2" 21227 2 "1668877_3" 1668877 "1668877_3_1" 21227 1 "1765717_2" 1765717 "1765717_2_3" 21227 1 end format %td date
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str9 individ_ID str14 etapa_ID long ID_HOGAR byte ID_ETAPA "1000154_1" "1000154_1_1_1" 1000154 1 "1000154_1" "1000154_1_2_1" 1000154 1 "1000154_2" "1000154_2_1_1" 1000154 1 "1000154_2" "1000154_2_1_2" 1000154 2 "1000154_2" "1000154_2_2_1" 1000154 1 "1000154_2" "1000154_2_2_2" 1000154 2 "1000289_1" "1000289_1_1_1" 1000289 1 "1000289_1" "1000289_1_1_2" 1000289 2 "1000289_1" "1000289_1_1_3" 1000289 3 "1000289_1" "1000289_1_2_1" 1000289 1 "1000289_1" "1000289_1_2_2" 1000289 2 "1000289_1" "1000289_1_2_3" 1000289 3 "1000289_3" "1000289_3_1_1" 1000289 1 "1000289_3" "1000289_3_2_1" 1000289 1 "1000289_3" "1000289_3_3_1" 1000289 1 "1000289_3" "1000289_3_4_1" 1000289 1 "1000370_1" "1000370_1_1_1" 1000370 1 "1000370_1" "1000370_1_2_1" 1000370 1 "1000370_1" "1000370_1_3_1" 1000370 1 "1000370_1" "1000370_1_4_1" 1000370 1 "1000370_2" "1000370_2_1_1" 1000370 1 "1000370_2" "1000370_2_2_1" 1000370 1 "1000457_1" "1000457_1_1_1" 1000457 1 "1000457_1" "1000457_1_2_1" 1000457 1 "1000457_2" "1000457_2_1_1" 1000457 1 "1000457_2" "1000457_2_2_1" 1000457 1 "1000457_4" "1000457_4_1_1" 1000457 1 "1000457_4" "1000457_4_1_2" 1000457 2 "1000457_4" "1000457_4_2_1" 1000457 1 "1000457_4" "1000457_4_2_2" 1000457 2 "1000660_1" "1000660_1_1_1" 1000660 1 "1000660_1" "1000660_1_1_2" 1000660 2 "1000660_1" "1000660_1_2_1" 1000660 1 "1000660_1" "1000660_1_2_2" 1000660 2 "1001299_1" "1001299_1_1_1" 1001299 1 "1001299_1" "1001299_1_2_1" 1001299 1 "1001530_1" "1001530_1_1_1" 1001530 1 "1001530_1" "1001530_1_2_1" 1001530 1 "1001530_1" "1001530_1_3_1" 1001530 1 "1001530_1" "1001530_1_4_1" 1001530 1 "1001728_1" "1001728_1_1_1" 1001728 1 "1001728_1" "1001728_1_2_1" 1001728 1 "1001728_2" "1001728_2_1_1" 1001728 1 "1001728_2" "1001728_2_2_1" 1001728 1 "1001942_1" "1001942_1_1_1" 1001942 1 "1001942_1" "1001942_1_2_1" 1001942 1 "1001942_2" "1001942_2_1_1" 1001942 1 "1001942_2" "1001942_2_2_1" 1001942 1 "1002132_1" "1002132_1_1_1" 1002132 1 "1002132_1" "1002132_1_1_2" 1002132 2 "1002132_1" "1002132_1_2_1" 1002132 1 "1002132_1" "1002132_1_2_2" 1002132 2 "1002132_2" "1002132_2_1_1" 1002132 1 "1002132_2" "1002132_2_1_2" 1002132 2 "1002132_2" "1002132_2_1_3" 1002132 3 "1002132_2" "1002132_2_2_1" 1002132 1 "1002132_2" "1002132_2_2_2" 1002132 2 "1002132_2" "1002132_2_2_3" 1002132 3 "1002179_1" "1002179_1_1_1" 1002179 1 "1002179_1" "1002179_1_2_1" 1002179 1 "1002291_1" "1002291_1_1_1" 1002291 1 "1002291_1" "1002291_1_2_1" 1002291 1 "1002291_1" "1002291_1_3_1" 1002291 1 "1002291_1" "1002291_1_4_1" 1002291 1 "1002291_2" "1002291_2_1_1" 1002291 1 "1002291_2" "1002291_2_2_1" 1002291 1 "1002315_1" "1002315_1_1_1" 1002315 1 "1002315_1" "1002315_1_2_1" 1002315 1 "1002400_1" "1002400_1_1_1" 1002400 1 "1002400_1" "1002400_1_2_1" 1002400 1 "1002400_1" "1002400_1_3_1" 1002400 1 "1002400_1" "1002400_1_4_1" 1002400 1 "1002400_1" "1002400_1_5_1" 1002400 1 "1002400_1" "1002400_1_6_1" 1002400 1 "1002400_2" "1002400_2_1_1" 1002400 1 "1002400_2" "1002400_2_2_1" 1002400 1 "1002400_2" "1002400_2_3_1" 1002400 1 "1002400_2" "1002400_2_4_1" 1002400 1 "1002400_3" "1002400_3_1_1" 1002400 1 "1002400_3" "1002400_3_1_2" 1002400 2 "1002400_3" "1002400_3_2_1" 1002400 1 "1002400_3" "1002400_3_2_2" 1002400 2 "1002741_1" "1002741_1_1_1" 1002741 1 "1002741_2" "1002741_2_1_1" 1002741 1 "1002741_3" "1002741_3_1_1" 1002741 1 "1002741_3" "1002741_3_2_1" 1002741 1 "1002741_4" "1002741_4_1_1" 1002741 1 "1002741_4" "1002741_4_2_1" 1002741 1 "1002741_5" "1002741_5_1_1" 1002741 1 "1002741_5" "1002741_5_1_2" 1002741 2 "1002741_5" "1002741_5_2_1" 1002741 1 "1002741_5" "1002741_5_2_2" 1002741 2 "1002741_6" "1002741_6_1_1" 1002741 1 "1002809_1" "1002809_1_1_1" 1002809 1 "1002809_1" "1002809_1_2_1" 1002809 1 "1002809_1" "1002809_1_3_1" 1002809 1 "1002809_2" "1002809_2_1_1" 1002809 1 "1002809_2" "1002809_2_2_1" 1002809 1 "1002809_2" "1002809_2_3_1" 1002809 1 "1002956_2" "1002956_2_1_1" 1002956 1 end
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str12 viaje_ID byte ID_IND long ID_HOGAR float date byte N_ETAPAS_POR_VIAJE str4(VORIHORAINI VDESHORAFIN) "189_1_1" 1 189 21299 1 "0800" "0830" "189_1_2" 1 189 21299 1 "1800" "1830" "244_1_1" 1 244 21300 1 "0800" "0820" "244_1_2" 1 244 21300 1 "1800" "1820" "244_2_1" 2 244 21300 1 "1113" "1115" "244_2_2" 2 244 21300 1 "1218" "1220" "324_1_1" 1 324 21299 1 "1117" "1120" "324_1_2" 1 324 21299 1 "1312" "1315" "414_1_1" 1 414 21300 1 "1059" "1100" "414_1_2" 1 414 21300 1 "1319" "1320" "414_2_1" 2 414 21300 1 "1156" "1200" "414_2_2" 2 414 21300 1 "1256" "1300" "414_3_1" 3 414 21300 1 "0841" "0845" "414_3_2" 3 414 21300 1 "1711" "1715" "414_3_3" 3 414 21300 1 "1900" "1920" "581_1_1" 1 581 21299 1 "0700" "0705" "581_2_1" 2 581 21299 1 "0800" "0830" "581_2_2" 2 581 21299 1 "1500" "1530" "581_3_1" 3 581 21299 1 "0856" "0900" "581_3_2" 3 581 21299 1 "1611" "1615" "581_4_1" 4 581 21299 1 "0830" "0900" "581_4_2" 4 581 21299 1 "1600" "1615" "742_1_1" 1 742 21299 1 "0700" "0800" "742_1_2" 1 742 21299 1 "1600" "1645" "742_2_1" 2 742 21299 2 "0800" "0930" "742_2_2" 2 742 21299 2 "1700" "1820" "759_1_1" 1 759 21300 1 "0740" "0800" "759_1_2" 1 759 21300 1 "0800" "0815" "759_1_3" 1 759 21300 1 "1433" "1445" "759_1_4" 1 759 21300 1 "1445" "1510" "759_2_1" 2 759 21300 1 "0645" "0800" "759_2_2" 2 759 21300 1 "1800" "1930" "759_3_1" 3 759 21300 1 "0730" "0815" "759_3_2" 3 759 21300 1 "1415" "1500" "759_4_1" 4 759 21300 1 "0745" "0800" "759_4_2" 4 759 21300 1 "1445" "1500" "914_1_1" 1 914 21299 1 "1100" "1115" "914_1_2" 1 914 21299 1 "1300" "1320" "914_2_1" 2 914 21299 1 "0800" "0815" "914_2_2" 2 914 21299 1 "1800" "1820" "1096_1_1" 1 1096 21300 2 "0740" "0800" "1096_1_2" 1 1096 21300 3 "1500" "1600" "1096_2_1" 2 1096 21300 1 "0857" "0900" "1096_2_2" 2 1096 21300 1 "0907" "0915" "1096_2_3" 2 1096 21300 1 "1159" "1200" "1096_2_4" 2 1096 21300 1 "1349" "1400" "1096_2_5" 2 1096 21300 1 "1406" "1410" "1096_3_1" 3 1096 21300 1 "0851" "0900" "1096_3_2" 3 1096 21300 1 "1400" "1410" "1479_1_1" 1 1479 21300 1 "0812" "0815" "1479_1_2" 1 1479 21300 1 "0815" "0900" end format %td date
Could anyone give me a solution please?
I don't know if this is useful, but my master dataset aggregates each individual's journey into a variable.
For example, instead of detailing each stage for each journey, the master dataset aggregates the number of stages per journey into the variable N_ETAPAS_POR_VIAJE.
On the other hand, the using dataset details each stage of each trip for each individual. For example, an observation for stage 1 of trip 1, an observation for stage 2 of trip 1, etc. It is represented by the variable ID_ETAPA.
Thank you very much for the help provided.
Best,
Michael
Comment