methodological question: matching/imputation based on two datasets

Marta Baltruszewicz

Join Date: Feb 2019

Posts: 39
#1

methodological question: matching/imputation based on two datasets

21 Feb 2021, 10:47

Hello,

I struggle to find the right method for what I want to do using two household surveys. I have two datasets:
1) X dataset with socio-econ info (A1) and Z info
2) Y dataset with socio-econ info (A2)

The Y dataset does not have Z info and this is what I want to impute based on the X dataset. The imputation/matching will be based on socio-econ info (A1 and A2). Which method is the best? I looked into MI with MAR options where they use mixed-method multiple imputations but this method is based on the fact that you impute missing values from the SAME population. I'm not so sure if I can use this method with my data.

If my example is too abstract then consider this: I have two household survey datasets. X has expenditures on food, clothing, and house fuels but Y dataset does not have it so I need to impute this information. This I can do because I have information related to income, household size, appliances ownership, etc in both datasets. So if the marginal distribution in both datasets X and Y is similar for these socio-econ characteristics I can then impute the expenditure data.

I would greatly appreciate any help - even naming method or tools that are available in STATA will be super helpful!

Cheers,

Marta
Tags: None
Marta Baltruszewicz

Join Date: Feb 2019

Posts: 39
#2

22 Feb 2021, 04:24

Anyone? I would really appreciate some help
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4546
#3

22 Feb 2021, 05:22

as far as I know, the.easiest way to do this is to make the 2 datasets into one; however, what you describe in #1 is not clear enough for me to say whether you want -merge- or -append- (or even -joinby- or -cross-)
Comment
Marta Baltruszewicz

Join Date: Feb 2019

Posts: 39
#4

22 Feb 2021, 05:49

Hi Rich, thank you for answering.
This is not about merging/appending two datasets. I have two different household surveys and both are representative for the UK. I want to estimate household expenditures in survey Y by using expenditures in survey X. To make it very simple I could calculate what is an average expenditure in survey X and apply this to survey Y using common in both datasets variables. For example, in both X and Y datasets I have information about income and household size. I could calculate the average expenditure on food in survey X for different income groups and household size and then use that average to match with a household in survey Y with the same level of income and household size. BUT! This is too simplistic and not a robust method enough. I'm struggling to find the correct tool for it. Would ps2match, mi command or smmatch would be good? The problem is that I'm not using the same exact populations. Each household survey X and Y are representative of UK (by suing weights) but there are conducted in different years (1 year apart) and I cannot expect that the same households were interviewed. I hope this explanation helps a bit to understand what is my issue.
Comment
David Radwin

Join Date: Mar 2014

Posts: 370
#5

24 Feb 2021, 19:22

These citations won't give you the recipe in Stata, but they might help you out:

Franklin, C. H. (1989). Estimation across data sets: two-stage auxiliary instrumental variables estimation (2SAIV). Political Analysis, 1, 1-23.

Gelman, A., King, G., & Liu, C. (1998). Not asked and not answered: Multiple imputation for multiple surveys. Journal of the American Statistical Association, 93(443), 846-857.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
Marta Baltruszewicz

Join Date: Feb 2019

Posts: 39
#6

25 Feb 2021, 04:32

Hi David, thank you for those tips! This might help.
Comment
Julia Brros

Join Date: Dec 2022

Posts: 1
#7

07 Dec 2022, 08:15

Hi, did you find out the best way of doing it?
Comment

Announcement

methodological question: matching/imputation based on two datasets

Comment

Comment

Comment

Comment

Comment

Comment