Hi everyone,
I want to merge two datasets. I tried, but it does not give me the right number of observations that I should.
My master data file has 222,744 observations. When I do the -merge- process, I obtain 257,138 observations, and that's not what I want.
I explain: On one hand, my master dataset contains a variable that aggregates each individual's steps. For example, if an individual has taken the metro for the fisrt step of her trip, and then car for the second step of the trip, -N_ETAPAS_POR_VIAJE- should be equal to 2.
One the other hand, my using file has each step very detailed in a variable called -ID_ETAPA-. I give you examples of my master dataset, and my using dataset.
Master dataset:
using dataset:
Could anyone provide me a solution to that please?
Thank you in advance.
Michael
I want to merge two datasets. I tried, but it does not give me the right number of observations that I should.
My master data file has 222,744 observations. When I do the -merge- process, I obtain 257,138 observations, and that's not what I want.
I explain: On one hand, my master dataset contains a variable that aggregates each individual's steps. For example, if an individual has taken the metro for the fisrt step of her trip, and then car for the second step of the trip, -N_ETAPAS_POR_VIAJE- should be equal to 2.
One the other hand, my using file has each step very detailed in a variable called -ID_ETAPA-. I give you examples of my master dataset, and my using dataset.
Master dataset:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long ID_HOGAR byte(ID_IND ID_VIAJE) float date byte N_ETAPAS_POR_VIAJE 1000154 1 1 21328 1 1000154 1 2 21328 1 1000154 2 1 21328 2 1000154 2 2 21328 2 1000289 1 1 21325 3 1000289 1 2 21325 3 1000289 3 1 21325 1 1000289 3 2 21325 1 1000289 3 3 21325 1 1000289 3 4 21325 1 1000370 1 1 21325 1 1000370 1 2 21325 1 1000370 1 3 21325 1 1000370 1 4 21325 1 1000370 2 1 21325 1 1000370 2 2 21325 1 1000457 1 1 21325 1 1000457 1 2 21325 1 1000457 2 1 21325 1 1000457 2 2 21325 1 1000457 4 1 21325 2 1000457 4 2 21325 2 1000660 1 1 21325 2 1000660 1 2 21325 2 1001299 1 1 21264 1 1001299 1 2 21264 1 1001530 1 1 21333 1 1001530 1 2 21333 1 1001530 1 3 21333 1 1001530 1 4 21333 1 1001728 1 1 21298 1 1001728 1 2 21298 1 1001728 2 1 21298 1 1001728 2 2 21298 1 1001942 1 1 21286 1 1001942 1 2 21286 1 1001942 2 1 21286 1 1001942 2 2 21286 1 1002132 1 1 21285 2 1002132 1 2 21285 2 1002132 2 1 21285 3 1002132 2 2 21285 3 1002179 1 1 21313 1 1002179 1 2 21313 1 1002291 1 1 21314 1 1002291 1 2 21314 1 1002291 1 3 21314 1 1002291 1 4 21314 1 1002291 2 1 21314 1 1002291 2 2 21314 1 1002315 1 1 21314 1 1002315 1 2 21314 1 1002400 1 1 21314 1 1002400 1 2 21314 1 1002400 1 3 21314 1 1002400 1 4 21314 1 1002400 1 5 21314 1 1002400 1 6 21314 1 1002400 2 1 21314 1 1002400 2 2 21314 1 1002400 2 3 21314 1 1002400 2 4 21314 1 1002400 3 1 21314 2 1002400 3 2 21314 2 1002741 1 1 21237 1 1002741 2 1 21237 1 1002741 3 1 21237 1 1002741 3 2 21237 1 1002741 4 1 21237 1 1002741 4 2 21237 1 1002741 5 1 21237 2 1002741 5 2 21237 2 1002741 6 1 21237 1 1002809 1 1 21235 1 1002809 1 2 21235 1 1002809 1 3 21235 1 1002809 2 1 21235 1 1002809 2 2 21235 1 1002809 2 3 21235 1 1002956 2 1 21250 2 1002956 2 2 21250 2 1003400 1 1 21257 2 1003400 1 2 21257 2 1003400 2 1 21257 1 1003400 2 2 21257 1 1003589 1 1 21262 1 1003589 1 2 21262 1 1003589 2 1 21262 1 1003589 2 2 21262 1 1003589 2 3 21262 1 1003589 2 4 21262 1 1003589 2 5 21262 1 1003589 2 6 21262 1 1003589 2 7 21262 1 1003589 2 8 21262 1 1003988 1 1 21263 1 1003988 1 2 21263 1 1003988 2 1 21263 1 1003988 2 2 21263 1 1003988 2 3 21263 1 end format %td date
using dataset:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str9 individ_ID long ID_HOGAR byte(ID_IND ID_VIAJE ID_ETAPA) "189_1" 189 1 1 1 "189_1" 189 1 2 1 "244_1" 244 1 1 1 "244_1" 244 1 2 1 "244_2" 244 2 1 1 "244_2" 244 2 2 1 "324_1" 324 1 1 1 "324_1" 324 1 2 1 "414_1" 414 1 1 1 "414_1" 414 1 2 1 "414_2" 414 2 1 1 "414_2" 414 2 2 1 "414_3" 414 3 1 1 "414_3" 414 3 2 1 "414_3" 414 3 3 1 "581_1" 581 1 1 1 "581_2" 581 2 1 1 "581_2" 581 2 2 1 "581_3" 581 3 1 1 "581_3" 581 3 2 1 "581_4" 581 4 1 1 "581_4" 581 4 2 1 "742_1" 742 1 1 1 "742_1" 742 1 2 1 "742_2" 742 2 1 1 "742_2" 742 2 1 2 "742_2" 742 2 2 1 "742_2" 742 2 2 2 "759_1" 759 1 1 1 "759_1" 759 1 2 1 "759_1" 759 1 3 1 "759_1" 759 1 4 1 "759_2" 759 2 1 1 "759_2" 759 2 2 1 "759_3" 759 3 1 1 "759_3" 759 3 2 1 "759_4" 759 4 1 1 "759_4" 759 4 2 1 "914_1" 914 1 1 1 "914_1" 914 1 2 1 "914_2" 914 2 1 1 "914_2" 914 2 2 1 "1096_1" 1096 1 1 1 "1096_1" 1096 1 1 2 "1096_1" 1096 1 2 1 "1096_1" 1096 1 2 2 "1096_1" 1096 1 2 3 "1096_2" 1096 2 1 1 "1096_2" 1096 2 2 1 "1096_2" 1096 2 3 1 "1096_2" 1096 2 4 1 "1096_2" 1096 2 5 1 "1096_3" 1096 3 1 1 "1096_3" 1096 3 2 1 "1479_1" 1479 1 1 1 "1479_1" 1479 1 2 1 "1479_1" 1479 1 3 1 "1479_1" 1479 1 4 1 "1479_2" 1479 2 1 1 "1479_2" 1479 2 2 1 "1479_3" 1479 3 1 1 "1479_3" 1479 3 2 1 "1508_1" 1508 1 1 1 "1508_1" 1508 1 2 1 "1508_2" 1508 2 1 1 "1508_2" 1508 2 2 1 "1710_1" 1710 1 1 1 "1710_1" 1710 1 2 1 "1710_1" 1710 1 3 1 "1710_2" 1710 2 1 1 "1710_2" 1710 2 2 1 "1710_3" 1710 3 1 1 "1710_3" 1710 3 2 1 "1710_3" 1710 3 3 1 "1710_4" 1710 4 1 1 "1710_4" 1710 4 2 1 "2275_2" 2275 2 1 1 "2275_2" 2275 2 1 2 "2275_3" 2275 3 1 1 "2275_3" 2275 3 1 2 "2322_1" 2322 1 1 1 "2322_3" 2322 3 1 1 "2322_5" 2322 5 1 1 "2332_1" 2332 1 1 1 "2332_1" 2332 1 1 2 "2332_1" 2332 1 2 1 "2332_1" 2332 1 4 1 "2332_1" 2332 1 5 1 "2502_2" 2502 2 1 1 "2502_2" 2502 2 2 1 "2502_3" 2502 3 1 1 "2502_3" 2502 3 2 1 "2882_2" 2882 2 1 1 "3216_1" 3216 1 1 1 "3216_1" 3216 1 1 2 "3216_1" 3216 1 1 3 "3216_1" 3216 1 2 1 "3216_1" 3216 1 2 2 "3216_1" 3216 1 2 3 "3216_1" 3216 1 3 1 end
Thank you in advance.
Michael
Comment