Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Management

    Dear all,

    I don't know how to combine two cross-sectional datasets.
    Here is my case

    I have Household Survey data, particularly 6 files for 6 years.
    I want to build panel data from 6 cross-sectional files, but the problem that the observations are not repeated during 6 years. A small proportion of the sample can be build into panel data. Furthermore, the questionnaires are not exactly the same.

    In detail
    First file Y2014, the display is shown as followed
    MemberID HouseholdID v1 v2 v3 v4 DummyVar HouseholdID2013 **Note: DummyVar captures the household that was interviewed in the previous survey
    123451 12345 ... ... ... ... 0 -
    123452 12345 ... ... ... ... 0 -
    123461 12346 ... ... ... ... 1 12346001
    123462 12346 .. ... ... ... 1 12346001
    123463 12346 ... ... ... ... 1 12346001
    123464 12346 ... ... ... ... 1 12346001

    123471 12346 ... ... ... ... 1 12347001
    123472 12346 .. ... ... ... 1 12347001


    The next file Y2013 is shown as following
    MemberID HouseholdID v1 v2 v3 v4 v5 v6 DummyVar HouseholdID2012 **Note: DummyVar captures the household that was interviewed in the previous survey
    102010001 10201000 ... ... ... ... 0 -
    102010002 10201000 ... ... ... ... 0 -
    123460011 12346001 ... ... ... ... 1 12346001
    123460012 12346001 .. ... ... ... 1 12346001
    123460013 12346001 ... ... ... ... 1 12346001

    123470011 12347001 ... ... ... ... 0 -
    123470012 12347001 .. ... ... ... 0 -

    Similarly, I have some files back to 2006.
    For this example above, only the household in bold is interviewed for two years, so that I can build panel data.

    One further question: This household in Y2013 had 3 members but Y2014 had 4 members with the newly-born kid, for example. Or in some cases, the number of members in panel data reduces as a result of any reason but the each member doesn't have his/her own memberID because it is ordered ordinarily. So, I cannot capture who is missing or who is adding. Pooling cross-sectional data may be the best solution, is it right? And how can I do it?

    Thank you so much

    Best regards

  • #2
    You didn't get a quick answer. You'd have a better chance if you provided your data code (using code delimiters), Stata output, and example data (using dataex). See the FAQ on asking questions.

    I (and I suspect others) have difficulty knowing what you really want. If you simply want to stack the yearly data, you can put a year identifier into each data set, and then just append them - look up append in the manuals.

    It looks like you have panel data - multiple households and members observed in multiple years. So, it is almost certain that the long format (what you'd want for a panel analysis) is the best way to go. If you have the data set up into one long data set, then you can answer your questions. You should be able to use generate or egen by groups to answer your questions - changes in family size, etc..

    You're going to have to tell us what the unit of observation will be for your analysis. Is the observation the family-year or is it the member-year (where multiple members can be within a family) or what? With a clearer idea of what you want, we can be of more assistance.

    Comment


    • #3
      Thank you, Phil

      -append syntax may help me out.

      Comment


      • #4
        Being this so, you just need to type:


        Code:
        . help append
        And see several examples worth reading and testing whether the command works in your case.
        Best regards,

        Marcos

        Comment

        Working...
        X