Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging survey data same variables in two separate files with different structure

    Hello Members,

    I would like to seek your help on problem encountered in attempting to merge file containing variables for the same a unit of observation (household).

    I am trying to use household ID (hh) and other household identifiers (ae, astrata, type) which together should allow to identifying a household uniquely. As shown below the variables are described the same across the two files but have different no. observations, so cannot be matched one-to-one basis to merge files.

    Could you suggest any trick to overcome the problem?

    Thanks,
    Mohamud

    FILE 1

    Code:
     
    strata byte %8.0g strata_id
    strata_id
    ea int %8.0g EA ID
    block byte %8.0g Block ID
    hh byte %8.0g Household ID
    type byte %8.0g type_hh Urban/Rural/IDP or Nomad
    astrata byte %8.0g lastrata Analytical strata
    weight float %9.0g Household weight
    hhh_edu byte %8.0g edu_level_broad
    Education Level of Household Head
    hhh_age byte %8.0g Age of Household Head
    hhh_gender byte %8.0g hhm_gender
    Gender of Household Head
    . sum strata ea block hh type astrata weight hhh_edu hhh_age hhh_gender
    Variable Obs Mean Std. Dev. Min Max
    strata 6,092 34.24606 14.62617 1 57
    ea 6,092 209.0635 114.3199 1 413
    block 6,092 31.29711 8.778698 1 69
    hh 6,092 9.810407 8.249635 1 60
    type 6,092 1.584865 .9465371 1 4
    astrata 6,092 7.245732 4.145188 1 15
    weight 6,092 318.3864 656.6932 .1215384 10705.87
    hhh_edu 4,279 1.332554 1.987206 0 6
    hhh_age 6,091 37.79182 11.93012 16 90
    hhh_gender 6,092 .5041037 .5000242 0 1
    FILE 2

    Code:
     
    describe strata ea block hh astrata
    storage display value
    variable name type format label variable label
    strata int %16.0g Strata
    ea int %16.0g EA ID
    block byte %16.0g Block ID
    hh byte %16.0g Household ID
    astrata byte %16.0g astrata Analytical Strata
    . sum strata ea block hh astrata
    Variable Obs Mean Std. Dev. Min Max
    strata 474,025 256.171 180.3539 101 1303
    ea 474,025 171.3944 97.98881 1 341
    block 139,430 19.75147 7.056427 1 44
    hh 474,025 6.567827 3.51177 1 19
    astrata 474,025 13.83234 5.129155 3 22

  • #2
    It appears to me that your first dataset is data on each household, 1 observation per household, while your second dataset contains observations of characteristics within each household, several observations per household. If that is the case, then something like the following will perhaps start you on your way.
    Code:
    use dataset1
    merge 1:m strata ea block hh using dataset2
    followed by careful review of the output from the merge command to see if there were mismatches: households with no details, or details whose household was not found.

    The key is that "1:m" tells merge that dataset1 will have 1 observation for each household, while dataset2 will have multiple observations for each household. You tried "1:1" which tells merge that both datasets have a single observation for each household.

    The documentation for the merge command in the Stata Data Management Reference Manual PDF is useful background to making effective use of the merge command.

    Comment


    • #3
      Thank you William- that's very helpful William.

      Comment


      • #4
        Mohamud Hussein - One more thought has occurred to me.

        My experience with survey data is that I would expect your enumeration area ID (ea), block ID (block), and household ID (hh) together to be sufficient to identify your households. So I suggest you try the following.
        Code:
        use dataset1
        isid ea block hh
        isid block hh
        This will test if ea block and hh together are distinct - only one observation for any combination. Here's sample output on made up data to demonstrate what I expect the output to look like.
        Code:
        . isid ea block hh
        
        . isid block hh
        variables block hh do not uniquely identify the observations
        r(459);
        From the second isid we are told that block and hh together (without including ea) do not uniquely identify observations. From the first isid we see nothing - and lack of an error message means that ea block and hh together do uniquely identify the observations, and no other variable (like strata or astrata or type) is necessary.

        If you see similar results, that means my code in post #2 can instead be
        Code:
        use dataset1
        merge 1:m ea block hh using dataset2
        Now, if you do that, you should know that since astrata is in both dataset1 and dataset2, when the merge is run, the observations that matched on ea block and hh will retain the value of astrata that was in dataset1, and will ignore the value from dataset2. Which is what you want, I'm sure. But if you ever want to use merge to update (replace values in) one dataset with values from another, it's back to the merge documentation to understand your options.

        I hope you find this more helpful than confusing.

        Comment


        • #5
          Many thanks William. Your suggestions are very useful and clear. I appreciate it.

          Comment

          Working...
          X