Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem in merging twodatasets : Master file with aggregated var. of interest, using file with individual observations for this var.

    Hi everyone,

    I want to merge two datasets. I tried, but it does not give me the right number of observations that I should.

    My master data file has 222,744 observations. When I do the -merge- process, I obtain 257,138 observations, and that's not what I want.

    I explain: On one hand, my master dataset contains a variable that aggregates each individual's steps. For example, if an individual has taken the metro for the fisrt step of her trip, and then car for the second step of the trip, -N_ETAPAS_POR_VIAJE- should be equal to 2.
    One the other hand, my using file has each step very detailed in a variable called -ID_ETAPA-. I give you examples of my master dataset, and my using dataset.

    Master dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long ID_HOGAR byte(ID_IND ID_VIAJE) float date byte N_ETAPAS_POR_VIAJE
    
    1000154 1 1 21328 1
    1000154 1 2 21328 1
    1000154 2 1 21328 2
    1000154 2 2 21328 2
    1000289 1 1 21325 3
    1000289 1 2 21325 3
    1000289 3 1 21325 1
    1000289 3 2 21325 1
    1000289 3 3 21325 1
    1000289 3 4 21325 1
    1000370 1 1 21325 1
    1000370 1 2 21325 1
    1000370 1 3 21325 1
    1000370 1 4 21325 1
    1000370 2 1 21325 1
    1000370 2 2 21325 1
    1000457 1 1 21325 1
    1000457 1 2 21325 1
    1000457 2 1 21325 1
    1000457 2 2 21325 1
    1000457 4 1 21325 2
    1000457 4 2 21325 2
    1000660 1 1 21325 2
    1000660 1 2 21325 2
    1001299 1 1 21264 1
    1001299 1 2 21264 1
    1001530 1 1 21333 1
    1001530 1 2 21333 1
    1001530 1 3 21333 1
    1001530 1 4 21333 1
    1001728 1 1 21298 1
    1001728 1 2 21298 1
    1001728 2 1 21298 1
    1001728 2 2 21298 1
    1001942 1 1 21286 1
    1001942 1 2 21286 1
    1001942 2 1 21286 1
    1001942 2 2 21286 1
    1002132 1 1 21285 2
    1002132 1 2 21285 2
    1002132 2 1 21285 3
    1002132 2 2 21285 3
    1002179 1 1 21313 1
    1002179 1 2 21313 1
    1002291 1 1 21314 1
    1002291 1 2 21314 1
    1002291 1 3 21314 1
    1002291 1 4 21314 1
    1002291 2 1 21314 1
    1002291 2 2 21314 1
    1002315 1 1 21314 1
    1002315 1 2 21314 1
    1002400 1 1 21314 1
    1002400 1 2 21314 1
    1002400 1 3 21314 1
    1002400 1 4 21314 1
    1002400 1 5 21314 1
    1002400 1 6 21314 1
    1002400 2 1 21314 1
    1002400 2 2 21314 1
    1002400 2 3 21314 1
    1002400 2 4 21314 1
    1002400 3 1 21314 2
    1002400 3 2 21314 2
    1002741 1 1 21237 1
    1002741 2 1 21237 1
    1002741 3 1 21237 1
    1002741 3 2 21237 1
    1002741 4 1 21237 1
    1002741 4 2 21237 1
    1002741 5 1 21237 2
    1002741 5 2 21237 2
    1002741 6 1 21237 1
    1002809 1 1 21235 1
    1002809 1 2 21235 1
    1002809 1 3 21235 1
    1002809 2 1 21235 1
    1002809 2 2 21235 1
    1002809 2 3 21235 1
    1002956 2 1 21250 2
    1002956 2 2 21250 2
    1003400 1 1 21257 2
    1003400 1 2 21257 2
    1003400 2 1 21257 1
    1003400 2 2 21257 1
    1003589 1 1 21262 1
    1003589 1 2 21262 1
    1003589 2 1 21262 1
    1003589 2 2 21262 1
    1003589 2 3 21262 1
    1003589 2 4 21262 1
    1003589 2 5 21262 1
    1003589 2 6 21262 1
    1003589 2 7 21262 1
    1003589 2 8 21262 1
    1003988 1 1 21263 1
    1003988 1 2 21263 1
    1003988 2 1 21263 1
    1003988 2 2 21263 1
    1003988 2 3 21263 1
    end
    format %td date

    using dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 individ_ID long ID_HOGAR byte(ID_IND ID_VIAJE ID_ETAPA)
    "189_1"   189 1 1 1
    "189_1"   189 1 2 1
    "244_1"   244 1 1 1
    "244_1"   244 1 2 1
    "244_2"   244 2 1 1
    "244_2"   244 2 2 1
    "324_1"   324 1 1 1
    "324_1"   324 1 2 1
    "414_1"   414 1 1 1
    "414_1"   414 1 2 1
    "414_2"   414 2 1 1
    "414_2"   414 2 2 1
    "414_3"   414 3 1 1
    "414_3"   414 3 2 1
    "414_3"   414 3 3 1
    "581_1"   581 1 1 1
    "581_2"   581 2 1 1
    "581_2"   581 2 2 1
    "581_3"   581 3 1 1
    "581_3"   581 3 2 1
    "581_4"   581 4 1 1
    "581_4"   581 4 2 1
    "742_1"   742 1 1 1
    "742_1"   742 1 2 1
    "742_2"   742 2 1 1
    "742_2"   742 2 1 2
    "742_2"   742 2 2 1
    "742_2"   742 2 2 2
    "759_1"   759 1 1 1
    "759_1"   759 1 2 1
    "759_1"   759 1 3 1
    "759_1"   759 1 4 1
    "759_2"   759 2 1 1
    "759_2"   759 2 2 1
    "759_3"   759 3 1 1
    "759_3"   759 3 2 1
    "759_4"   759 4 1 1
    "759_4"   759 4 2 1
    "914_1"   914 1 1 1
    "914_1"   914 1 2 1
    "914_2"   914 2 1 1
    "914_2"   914 2 2 1
    "1096_1" 1096 1 1 1
    "1096_1" 1096 1 1 2
    "1096_1" 1096 1 2 1
    "1096_1" 1096 1 2 2
    "1096_1" 1096 1 2 3
    "1096_2" 1096 2 1 1
    "1096_2" 1096 2 2 1
    "1096_2" 1096 2 3 1
    "1096_2" 1096 2 4 1
    "1096_2" 1096 2 5 1
    "1096_3" 1096 3 1 1
    "1096_3" 1096 3 2 1
    "1479_1" 1479 1 1 1
    "1479_1" 1479 1 2 1
    "1479_1" 1479 1 3 1
    "1479_1" 1479 1 4 1
    "1479_2" 1479 2 1 1
    "1479_2" 1479 2 2 1
    "1479_3" 1479 3 1 1
    "1479_3" 1479 3 2 1
    "1508_1" 1508 1 1 1
    "1508_1" 1508 1 2 1
    "1508_2" 1508 2 1 1
    "1508_2" 1508 2 2 1
    "1710_1" 1710 1 1 1
    "1710_1" 1710 1 2 1
    "1710_1" 1710 1 3 1
    "1710_2" 1710 2 1 1
    "1710_2" 1710 2 2 1
    "1710_3" 1710 3 1 1
    "1710_3" 1710 3 2 1
    "1710_3" 1710 3 3 1
    "1710_4" 1710 4 1 1
    "1710_4" 1710 4 2 1
    "2275_2" 2275 2 1 1
    "2275_2" 2275 2 1 2
    "2275_3" 2275 3 1 1
    "2275_3" 2275 3 1 2
    "2322_1" 2322 1 1 1
    "2322_3" 2322 3 1 1
    "2322_5" 2322 5 1 1
    "2332_1" 2332 1 1 1
    "2332_1" 2332 1 1 2
    "2332_1" 2332 1 2 1
    "2332_1" 2332 1 4 1
    "2332_1" 2332 1 5 1
    "2502_2" 2502 2 1 1
    "2502_2" 2502 2 2 1
    "2502_3" 2502 3 1 1
    "2502_3" 2502 3 2 1
    "2882_2" 2882 2 1 1
    "3216_1" 3216 1 1 1
    "3216_1" 3216 1 1 2
    "3216_1" 3216 1 1 3
    "3216_1" 3216 1 2 1
    "3216_1" 3216 1 2 2
    "3216_1" 3216 1 2 3
    "3216_1" 3216 1 3 1
    end
    Could anyone provide me a solution to that please?
    Thank you in advance.

    Michael

  • #2
    Well, looking at your data, I would say that the -merge- would go like this:
    Code:
    use master, clear
    merge 1:m ID_HOGAR ID_IND ID_VIAJE using using
    But I also see in your example data that there are no matches at all. In fact the values of ID_HOGAR, ID_IND, and ID_VIAJE aren't even similar numbers of digits in the two data sets. So if you are not getting appropriate -merge- results, I suspect that the problem lies in the data. Certainly if you -merge- the two example data sets you will get 200 observations, because nothing in either set matches anything in the other. I suspect that is happening, perhaps only partially, in your real data.

    Comment


    • #3
      Hi Clyde Schechter:

      Thank you for your suggestions and help!
      I will carefully look at the data.

      The -merge- you provided gives me the same observations as before, so I suppose that this is a problem with my dataset.
      I will check it carefully!

      Again, thank you so much.
      Best,

      Michael

      Comment

      Working...
      X