Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data transformation: including answers from spouse in the model

    Dear Stata users,

    I am using a cross section database (SHARE) and I want to extract items answered by a person's spouse to include them in my model, but people only answer questions on themselves, but not about their spouse directly. Therefore I would like to transform the data in such a way that I can use the items form the spouse and look if answers provided by the spouse influence individuals.

    For example as in:
    y = explanatory variables + answer from spouse + controls + e



    mergeid = person identifier
    hhid5 = household identifier
    ph005_ = variable of interest


    In my screenshot I have highlighted an example. The household id (hhid5) is the same for these individuals, but the person identifier (mergeid) is not.

    First, in order to create a spouse identifier I tried things such as:

    gen spouse_id = cond(hhid5 ==hhid5[_n+1], mergeid[_n+1], .) without any succes.

    In other words, how do I transform the data that the answer of ph005_ by person A can be used in the regression for person B (that is married to A)?

    I hope I am clear in my description.
    Thanks in advance.
    Last edited by Coenraad Smit; 11 Jan 2016, 09:40.

  • #2
    Dear Coenraad,

    I think I know what you want to do. However, it is not clear how you can tell who answered the survey. For example, other household surveys use 1 for "husband" 2 for "wife" 3 for "children" 4 for "house worker", etc. In this example, you can you tell if the wife is answering the survey. Once you figure that out, you can use a simple command called "substr".

    A good example would be:

    Code:
    gen wife_id = substr(household_id, 1, 2)
    What you are doing here is telling Stata that you want to extract the code that identifies women in the household identifier. The first word after the parenthesis is the name of the variable you are extracting the information from (it has to be a string variable). The second component is a number which indicates the position (in the household identifier) where you can locate the first letter of the id. The second number you see is how many digits it should extract.

    I hope this helps.

    Cheers and good luck,

    Jorge

    Comment


    • #3
      Dear Jorge,

      Thanks for your advice. However, I think it not what I mean

      In my screenshot I have highlighted an example. The household id (hhid5) is the same for these individuals, but the person identifier (mergeid) is not. I am not interested in differences between men and women, but merely want to get answers from person A and use them for person B and get answers given by person B and use them for person A (given that they are married, in which case their household id =hhid5 is identical).

      All the best,
      Coenraad

      Comment


      • #4
        You're making the assumption that two people in the same household are married. Do you know from the survey documentation that this is true?
        How are you planning on setting up your regression? Will both respondents be included in you sample with their own responses and their spouse's responses?

        You can create a spouse id variable as follows:
        Code:
        bysort hhid5: gen spouse_id=mergeid[_n+1] if _n==1
        bysort hhid5: replace spouse_id=mergeid[_n-1] if _n==2
        This makes the assumption that there are never more than two individuals within a given hhid5. It also assumes that if there are two individuals that they are always married. If either of these assumptions is untrue you'll need to provide more information about your data.

        Depending on how you want to set up your final models, creating a spouse id variable may not be a necessary step. I would probably just copy over the spouse responses directly.

        Also, please read the FAQ, which is linked in the navigation bar at the top of the screen. Attaching pictures of data is not the best way to share your data with the group. The FAQ has useful information on how to present data examples and offers some valuable advice on how to ask questions in a way that makes it easy for members of the forum to help you.

        Comment

        Working...
        X