Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • methodological question: matching/imputation based on two datasets

    Hello,

    I struggle to find the right method for what I want to do using two household surveys. I have two datasets:
    1) X dataset with socio-econ info (A1) and Z info
    2) Y dataset with socio-econ info (A2)

    The Y dataset does not have Z info and this is what I want to impute based on the X dataset. The imputation/matching will be based on socio-econ info (A1 and A2). Which method is the best? I looked into MI with MAR options where they use mixed-method multiple imputations but this method is based on the fact that you impute missing values from the SAME population. I'm not so sure if I can use this method with my data.


    If my example is too abstract then consider this: I have two household survey datasets. X has expenditures on food, clothing, and house fuels but Y dataset does not have it so I need to impute this information. This I can do because I have information related to income, household size, appliances ownership, etc in both datasets. So if the marginal distribution in both datasets X and Y is similar for these socio-econ characteristics I can then impute the expenditure data.

    I would greatly appreciate any help - even naming method or tools that are available in STATA will be super helpful!

    Cheers,

    Marta

  • #2
    Anyone? I would really appreciate some help

    Comment


    • #3
      as far as I know, the.easiest way to do this is to make the 2 datasets into one; however, what you describe in #1 is not clear enough for me to say whether you want -merge- or -append- (or even -joinby- or -cross-)

      Comment


      • #4
        Hi Rich, thank you for answering.
        This is not about merging/appending two datasets. I have two different household surveys and both are representative for the UK. I want to estimate household expenditures in survey Y by using expenditures in survey X. To make it very simple I could calculate what is an average expenditure in survey X and apply this to survey Y using common in both datasets variables. For example, in both X and Y datasets I have information about income and household size. I could calculate the average expenditure on food in survey X for different income groups and household size and then use that average to match with a household in survey Y with the same level of income and household size. BUT! This is too simplistic and not a robust method enough. I'm struggling to find the correct tool for it. Would ps2match, mi command or smmatch would be good? The problem is that I'm not using the same exact populations. Each household survey X and Y are representative of UK (by suing weights) but there are conducted in different years (1 year apart) and I cannot expect that the same households were interviewed. I hope this explanation helps a bit to understand what is my issue.

        Comment


        • #5
          These citations won't give you the recipe in Stata, but they might help you out:

          Franklin, C. H. (1989). Estimation across data sets: two-stage auxiliary instrumental variables estimation (2SAIV). Political Analysis, 1, 1-23.

          Gelman, A., King, G., & Liu, C. (1998). Not asked and not answered: Multiple imputation for multiple surveys. Journal of the American Statistical Association, 93(443), 846-857.
          David Radwin
          Senior Researcher, California Competes
          californiacompetes.org
          Pronouns: He/Him

          Comment


          • #6
            Hi David, thank you for those tips! This might help.

            Comment


            • #7

              Hi, did you find out the best way of doing it?

              Comment

              Working...
              X