Data transformation: including answers from spouse in the model

Coenraad Smit

Join Date: Nov 2015

Posts: 6
#1

Data transformation: including answers from spouse in the model

11 Jan 2016, 09:01

Dear Stata users,

I am using a cross section database (SHARE) and I want to extract items answered by a person's spouse to include them in my model, but people only answer questions on themselves, but not about their spouse directly. Therefore I would like to transform the data in such a way that I can use the items form the spouse and look if answers provided by the spouse influence individuals.

For example as in:
y = explanatory variables + answer from spouse + controls + e

mergeid = person identifier
hhid5 = household identifier
ph005_ = variable of interest

In my screenshot I have highlighted an example. The household id (hhid5) is the same for these individuals, but the person identifier (mergeid) is not.

First, in order to create a spouse identifier I tried things such as:

gen spouse_id = cond(hhid5 ==hhid5[_n+1], mergeid[_n+1], .) without any succes.

In other words, how do I transform the data that the answer of ph005_ by person A can be used in the regression for person B (that is married to A)?

I hope I am clear in my description.
Thanks in advance.

1 Photo

Last edited by Coenraad Smit; 11 Jan 2016, 09:40.
Tags: None
Jorge L. Guzman

Join Date: Mar 2015

Posts: 50
#2

11 Jan 2016, 09:12

Dear Coenraad,

I think I know what you want to do. However, it is not clear how you can tell who answered the survey. For example, other household surveys use 1 for "husband" 2 for "wife" 3 for "children" 4 for "house worker", etc. In this example, you can you tell if the wife is answering the survey. Once you figure that out, you can use a simple command called "substr".

A good example would be:

Code:

gen wife_id = substr(household_id, 1, 2)

What you are doing here is telling Stata that you want to extract the code that identifies women in the household identifier. The first word after the parenthesis is the name of the variable you are extracting the information from (it has to be a string variable). The second component is a number which indicates the position (in the household identifier) where you can locate the first letter of the id. The second number you see is how many digits it should extract.

I hope this helps.

Cheers and good luck,

Jorge
Comment
Coenraad Smit

Join Date: Nov 2015

Posts: 6
#3

11 Jan 2016, 09:37

Dear Jorge,

Thanks for your advice. However, I think it not what I mean

In my screenshot I have highlighted an example. The household id (hhid5) is the same for these individuals, but the person identifier (mergeid) is not. I am not interested in differences between men and women, but merely want to get answers from person A and use them for person B and get answers given by person B and use them for person A (given that they are married, in which case their household id =hhid5 is identical).

All the best,
Coenraad
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#4

11 Jan 2016, 12:53

You're making the assumption that two people in the same household are married. Do you know from the survey documentation that this is true?
How are you planning on setting up your regression? Will both respondents be included in you sample with their own responses and their spouse's responses?

You can create a spouse id variable as follows:

Code:

bysort hhid5: gen spouse_id=mergeid[_n+1] if _n==1 bysort hhid5: replace spouse_id=mergeid[_n-1] if _n==2

This makes the assumption that there are never more than two individuals within a given hhid5. It also assumes that if there are two individuals that they are always married. If either of these assumptions is untrue you'll need to provide more information about your data.

Depending on how you want to set up your final models, creating a spouse id variable may not be a necessary step. I would probably just copy over the spouse responses directly.

Also, please read the FAQ, which is linked in the navigation bar at the top of the screen. Attaching pictures of data is not the best way to share your data with the group. The FAQ has useful information on how to present data examples and offers some valuable advice on how to ask questions in a way that makes it easy for members of the forum to help you.
1 like
Comment

Announcement

Data transformation: including answers from spouse in the model

Comment

Comment

Comment