Merging/appending individual data (family members) with household data

Candice Yandam

Join Date: Feb 2017

Posts: 40
#1

Merging/appending individual data (family members) with household data

02 May 2017, 08:36

Hi all,

Glad to see all the spam posts gone- thank you admins!

I am trying to merge/append two datasets with the same households (identified by hhcode) with individual data. The individuals belong to the different households and are identified by hhcode but also by their individual number Personnalnumber.

Master data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int hhcode long(Personnalnumber medexp) 1005 100501 . 1014 101401 . 1025 102501 . 1025 102502 . 1031 103101 . 1031 103102 . 1031 103103 . 1031 103104 . 1033 103301 . 1033 103302 . 1035 103501 . 1039 103901 . 1039 103902 . 1039 103903 . 1052 105201 . 1053 105301 . 1057 105701 . 1058 105801 . 1058 105802 . 1058 105803 . 1058 105804 . 1061 106101 . 1061 106102 . 1063 106301 . end

Data to append:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int hhcode long medexp 1005 300 1014 200 1025 2000 1031 4000 1033 650 1035 1000 1039 2150 1052 0 1053 200 1057 500 1058 0 1061 0 1063 1000 end

I basically want to fill in the missing values in the master data (medexp: medical expenditures) and duplicate the observations if there are many members in the family (accounted for with the Personnalnumber variable).

I think the -merge- command is useless here since I'm not adding any new variables. However, if I use the -append- command, Stata properly appends the data but it just "adds" the observations at the end of the dataset, does not fill in the missing values.

I don't know how to tell Stata do fill in those missing values and to duplicate observations to the multiple members of the family. I think it would be a mix between merging and appending. Is there any way to do that?

On another note: I apologize because I've been posting questions every couple of days recently...

Last edited by Candice Yandam; 02 May 2017, 08:57.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30059
#2

02 May 2017, 09:16

The -merge- command has an -update- option that does this.

Code:

use dataset1, clear merge 1:1 hhid using dataset2, update

Do read the -merge- section of the manual to see exactly how -update- works, and how it differs from -replace-, another -merge- option that is sometimes needed instead.
Comment
Candice Yandam

Join Date: Feb 2017

Posts: 40
#3

02 May 2017, 09:54

Clyde-

Thanks for your answer. I checked out the -merge- command manual. Indeed, the -update- option is what would seem to be more appropriate here and the one that worked.

This is the code I ran:

Code:

merge m:1 hhcode using /Users/candiceyandam/Desktop/basetomerge.dta, update

I'd advise others (that had the same issue as me) to check for duplicates in the master or the using data (depending if using m:1 or 1:m) as a precaution because I actually had a coupe of missing values in the hhcode variable I wasn't aware of, that were seen as duplicates by Stata so it gave me an error message! Dropped the duplicates and it's all good.

Thanks again Clyde, much appreciated
Comment
Ngoc Le

Join Date: Aug 2019

Posts: 2
#4

18 Aug 2019, 09:37

Hi guys,

right now I'm trying to merge / append multiple datasets, but unfortunately I am having some difficulties.

So far I appended two datasets with the same variables: One dataset contains information about Thailand and the other one about Vietnam.

Next step, I merged this appended dataset (Thailand+Vietnam) with another subdataset for Thailand which has new variables. Since the subdataset has multiple observations for the keyvariable, I used the 1:m merge. Now I want to fill the missing values for Vietnam with a second subdataset which has multiple observations and the same variables as the other subdataset (Thailand). For that, I tried the merge-update command, but I only get the following error message:

"variable ques_id does not uniquely identify observations in the master data"

So far I've been using these Codes:

use datasetThailand, clear

append using datasetVietnam

merge 1:m ques_id using subdataThailand
gen merge1=_merge
drop _merge

merge 1:m ques_id using subdataVietnam, update

Can you please help me to merge these datasets?

Thanks in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30059
#5

18 Aug 2019, 11:41

Without seeing example data it is difficult to be certain, but I think you need to append subdata_Thailand and subdata_Vietnam first, and then -merge 1:m- using that.
Comment
Ngoc Le

Join Date: Aug 2019

Posts: 2
#6

20 Aug 2019, 02:53

Thank you Clyde! It actually worked with the order that you suggested.
Comment

Announcement

Merging/appending individual data (family members) with household data

Comment

Comment

Comment

Comment

Comment