Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging/appending individual data (family members) with household data

    Hi all,

    Glad to see all the spam posts gone- thank you admins!

    I am trying to merge/append two datasets with the same households (identified by hhcode) with individual data. The individuals belong to the different households and are identified by hhcode but also by their individual number Personnalnumber.

    Master data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int hhcode long(Personnalnumber medexp)
    1005 100501 .
    1014 101401 .
    1025 102501 .
    1025 102502 .
    1031 103101 .
    1031 103102 .
    1031 103103 .
    1031 103104 .
    1033 103301 .
    1033 103302 .
    1035 103501 .
    1039 103901 .
    1039 103902 .
    1039 103903 .
    1052 105201 .
    1053 105301 .
    1057 105701 .
    1058 105801 .
    1058 105802 .
    1058 105803 .
    1058 105804 .
    1061 106101 .
    1061 106102 .
    1063 106301 .
    
    end
    Data to append:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int hhcode long medexp
    1005  300
    1014  200
    1025 2000
    1031 4000
    1033  650
    1035 1000
    1039 2150
    1052    0
    1053  200
    1057  500
    1058    0
    1061    0
    1063 1000
    end
    I basically want to fill in the missing values in the master data (medexp: medical expenditures) and duplicate the observations if there are many members in the family (accounted for with the Personnalnumber variable).

    I think the -merge- command is useless here since I'm not adding any new variables. However, if I use the -append- command, Stata properly appends the data but it just "adds" the observations at the end of the dataset, does not fill in the missing values.

    I don't know how to tell Stata do fill in those missing values and to duplicate observations to the multiple members of the family. I think it would be a mix between merging and appending. Is there any way to do that?


    On another note: I apologize because I've been posting questions every couple of days recently...
    Last edited by Candice Yandam; 02 May 2017, 08:57.

  • #2
    The -merge- command has an -update- option that does this.

    Code:
    use dataset1, clear
    merge 1:1 hhid using dataset2, update
    Do read the -merge- section of the manual to see exactly how -update- works, and how it differs from -replace-, another -merge- option that is sometimes needed instead.

    Comment


    • #3
      Clyde-

      Thanks for your answer. I checked out the -merge- command manual. Indeed, the -update- option is what would seem to be more appropriate here and the one that worked.

      This is the code I ran:

      Code:
      merge m:1 hhcode using /Users/candiceyandam/Desktop/basetomerge.dta, update
      I'd advise others (that had the same issue as me) to check for duplicates in the master or the using data (depending if using m:1 or 1:m) as a precaution because I actually had a coupe of missing values in the hhcode variable I wasn't aware of, that were seen as duplicates by Stata so it gave me an error message! Dropped the duplicates and it's all good.

      Thanks again Clyde, much appreciated

      Comment


      • #4
        Hi guys,

        right now I'm trying to merge / append multiple datasets, but unfortunately I am having some difficulties.

        So far I appended two datasets with the same variables: One dataset contains information about Thailand and the other one about Vietnam.

        Next step, I merged this appended dataset (Thailand+Vietnam) with another subdataset for Thailand which has new variables. Since the subdataset has multiple observations for the keyvariable, I used the 1:m merge. Now I want to fill the missing values for Vietnam with a second subdataset which has multiple observations and the same variables as the other subdataset (Thailand). For that, I tried the merge-update command, but I only get the following error message:

        "variable ques_id does not uniquely identify observations in the master data"

        So far I've been using these Codes:


        use datasetThailand, clear

        append using datasetVietnam

        merge 1:m ques_id using subdataThailand
        gen merge1=_merge
        drop _merge

        merge 1:m ques_id using subdataVietnam, update

        Can you please help me to merge these datasets?

        Thanks in advance.

        Comment


        • #5
          Without seeing example data it is difficult to be certain, but I think you need to append subdata_Thailand and subdata_Vietnam first, and then -merge 1:m- using that.

          Comment


          • #6
            Thank you Clyde! It actually worked with the order that you suggested.

            Comment

            Working...
            X