Creating variables for husband and wife using data for respondent and their partner

William Lisowski replied

02 Sep 2020, 07:31
Had you read the documentation for reshape as Clyde suggested, you would learn that in your output the material I have colored blue and red

Code:

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string (note: j = g p_ q_) (note: ghgsex not found) (note: gedhigh1 not found) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 78919 -> 236757 Number of variables 474 -> 471 j variable (3 values) -> _j xij variables: ghgsex p_hgsex q_hgsex -> hgsex gmrcurr p_mrcurr q_mrcurr -> mrcurr gedhigh1 p_edhigh1 q_edhigh1 -> edhigh1 -----------------------------------------------------------------------------

include the appearance of gmrcurr which indicates that you have a variable in your data that ends with mrcurr that you have not previously mentioned in this topic, that Clyde overlooked in your post #7, and for which you did not provide example data that reproduces your problem, as Clyde requested in post #8.

The result is that the reshape command tried to create observations of not only for the prefixes p_ and q_ but also for the prefix g, and (since ghgsex and gedhigh1 do not exist) filled hgsex and edhigh1 with missing values for _j=="g".

For now, my advice is to add

Code:

rename gmrcurr gmrcurrX

before your rehshape command so that only p_mrcurr and q_mrcurr will be reshaped.
3 likes
Leave a comment:

Chris Boulis replied

02 Sep 2020, 04:43

Hi Clyde Schechter. Thank you for your explanation and advice. The code still does not run (output below states hgsex contains missing values even though I included this in the first line of code). I'm not sure what is wrong. I will read the PDF on -reshape-.

The code drops variables _j , which I believe includes q_ and p_ of hgsex, mrcurr and edhigh1, it effects other code in this file that includes those variables. Can we not drop _j or Is there another approach to changing respondent/partner variables to male partner/female partner variables or is this the best way?

Also, if the code did run, what would the newly created variables for "male partner education level" and "female partner education level" be called? Thanks for your help. Kind regards, Chris

Code:

. drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1, hgsex, p_hgsex)
(197,226 observations deleted)

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
(note: j = g p_ q_)
(note: ghgsex not found)
(note: gedhigh1 not found)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                    78919   ->  236757
Number of variables                 474   ->     471
j variable (3 values)                     ->   _j
xij variables:
                 ghgsex p_hgsex q_hgsex   ->   hgsex
              gmrcurr p_mrcurr q_mrcurr   ->   mrcurr
           gedhigh1 p_edhigh1 q_edhigh1   ->   edhigh1
-----------------------------------------------------------------------------

. drop _j

. reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)
variable hgsex contains missing values
r(498);

Leave a comment:

Clyde Schechter replied

01 Sep 2020, 11:59
i'm not sure what the issue is, in part because I don't fully understand what your code is doing.

Code:

rename (hgsex mrcurr edhigh1) q_= reshape long @hgsex @mrcurr @edhigh1, i(id p_id) j(_j) string drop _j reshape wide mrcurr edhigh1, i(id p_id) j(hgsex)

The -rename- command is preparation for the -reshape- command. Please read the PDF manual section on the -reshape- command: it will prove indispensable to you in managing this kind of data. Anyway, the names of the variables being reshaped have to have parallel structure. Yours don't because one member of the pair is prefixed with p_ and the other has no prefix. So the rename command makes them similar by giving a q_ prefix to the ones with no prefix. The choice of q_ was arbitrary and is of no importance. In fact, later, the p_ and q_ are both dropped in the -drop _j- command.

I noticed there were some missing data for the partner (p_mrcurr & p_edhigh1). I noticed that some respondents change partners over time. If you think either or both of these issues will cause a problem can we include code to exclude couples with missing data and only use the data for couples where the id remains the same?

Just add this line of code at the bgeginning

Code:

drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1)

Since each observation is a single partnership, there is no clear reason to eliminate those where the respondents change partners over time, at least for present purposes. It may be that for later analysis you will need to do that, but it isn't a problem for the immediate situation.
1 like
Leave a comment:

Chris Boulis replied

01 Sep 2020, 04:54

Hi Clyde Schechter. i'm not sure what the issue is, in part because I don't fully understand what your code is doing. I notice that your line of code to rename variables only includes the variables for the respondent (sex, marital status and education level) but not the partner (which are the same but contain the prefix "p_"). That said, you may already understand as I noted it in #1.

Further, in reviewing a larger sample of my data (20,000 observations), I noticed there were some missing data for the partner (p_mrcurr & p_edhigh1). I noticed that some respondents change partners over time. If you think either or both of these issues will cause a problem can we include code to exclude couples with missing data and only use the data for couples where the id remains the same? Can the code deal with increasing levels of education in couples? Finally, to ensure only couples are included, could we not add an if statement such as?

Code:

.... if inlist(mrcurr, 1, 2) & inlist(p_mrcurr, 1, 2)

In case it helps, here's another sample.

Code:

108  109  1 1 2 1 1 9 9
108  109  2 1 2 1 1 9 9
108  109  3 1 2 1 1 9 9
108  109  4 1 2 1 1 9 9
108  109  5 1 2 1 . 9 .
108  109  6 1 2 1 1 9 9
108  109  7 1 2 1 1 9 9
110 163 12 2 1 2 2 1 3
110 163 13 2 1 2 2 1 3
110 163 14 2 1 1 1 1 3
110 163 15 2 1 1 1 1 3
110 163 16 2 1 1 1 1 3
110 163 17 2 1 1 1 1 3
110 163 18 2 1 1 1 1 3
114  115 10 1 2 1 1 5 5
114  115 11 1 2 1 1 5 5
114  115 12 1 2 1 1 5 5
114  115 13 1 2 1 1 5 5
114  115 14 1 2 1 1 5 5
114  115 15 1 2 1 1 5 5
114  115 16 1 2 1 1 5 5
118  119  1 2 1 1 1 4 3
118  119  2 2 1 1 1 4 3
118  119  3 2 1 1 1 4 3
118  119  4 2 1 1 1 4 1
118  119  5 2 1 1 1 4 1
118  119  6 2 1 1 1 4 1
118  119  7 2 1 1 1 4 1
118  119  8 2 1 1 1 4 1
118  119  9 2 1 1 1 4 1
118  119 10 2 1 1 1 4 1
118  119 11 2 1 1 1 4 1
118  119 12 2 1 1 1 4 1
118  119 13 2 1 1 1 4 1
118  119 14 2 1 1 1 4 1
118  119 15 2 1 1 1 4 1
118  119 16 2 1 1 1 4 1
118  119 17 2 1 1 1 4 1
123  124 10 2 1 1 1 5 9
123  124 11 2 1 1 1 5 9
123  124 12 2 1 1 1 5 9
123  124 13 2 1 1 1 5 9
123  124 14 2 1 1 1 5 9
123  124 15 2 1 1 1 5 9
123  124 16 2 1 1 1 5 9
123  124 17 2 1 1 1 5 9
123  124 18 2 1 1 1 5 9
125 185 12 2 1 2 2 9 5
125 185 13 2 1 2 2 9 5
125 185 14 2 1 2 2 9 5
126 142 15 2 1 2 2 9 9
126 142 16 2 1 2 2 9 9
126 142 17 2 1 2 2 9 9
126 142 18 2 1 2 2 9 9
end

Leave a comment:

Clyde Schechter replied

30 Aug 2020, 11:07
Please use -dataex- to post some example data that reproduces this problem. It does not occur with the examples you have shown so far.

That said, the likely causes of this are either there are some couples where hgsex or p_hgsex is missing in the original data. Or it may be that you have some "couples" that consist of only a single person
1 like
Leave a comment:

Chris Boulis replied

29 Aug 2020, 22:30

Hi Clyde Schechter. I noted I had panel data at end of #5 but maybe it wasn't clear. I have multiple waves of panel data and have id p_id and wave variables that uniquely identify the observations. Could you please explain what your code is doing?

I am after two variables, one for education level of the male partner (e.g. husband) and one for educ level of the female partner (e.g. wife)? I would like to compare the effects of couples with similar levels of education with couples with very different levels of education to determine if gender has a significant effect on relationship dissolution. e.g. effect of male partner's level of educ = female partner's level of educ compared to that when male partner's educ > female's educ or when the female partner's educ > male partner's educ.

Given your comments I ran the following:

Code:

rename (hgsex mrcurr edhigh1) q_=
reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
drop _j
reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)

And received the following from Stata:

Code:

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
(note: j = g p_ q_)
(note: ghgsex not found)
(note: gedhigh1 not found)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                   276145   ->  828435
Number of variables                 466   ->     463
j variable (3 values)                     ->   _j
xij variables:
                 ghgsex p_hgsex q_hgsex   ->   hgsex
              gmrcurr p_mrcurr q_mrcurr   ->   mrcurr
           gedhigh1 p_edhigh1 q_edhigh1   ->   edhigh1
-----------------------------------------------------------------------------

. drop _j

. reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)
variable hgsex contains missing values
r(498);

Last edited by Chris Boulis; 29 Aug 2020, 22:34.

Leave a comment:

Clyde Schechter replied

29 Aug 2020, 21:04
So the message is telling you that you have multiple observations with the same id and p_id. Now you are referring to panel data, which you did not mention before. So I guess that the multiple observations for the same couple refer to different time periods. So perhaps you have a date variable, or a wave number or something like that which, combined with id and p_id uniquely identify the observations in your data. You need to include that variable along with id and p_id in the -i()- option of both -reshape- commands.
1 like
Leave a comment:

Chris Boulis replied

29 Aug 2020, 20:19

Thank you Clyde Schechter. I ran the code and Stata provided the following output:

Code:

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id) j(_j) string
(note: j = g p_ q_)

variable id does not uniquely identify the observations
    Your data are currently wide.  You are performing a reshape long.  You specified i(id p_id) and j(_j).  In the
    current wide form, variable id p_id should uniquely identify the observations.  Remember this picture:

         long                                wide
        +---------------+                   +------------------+
        | i   j   a   b |                   | i   a1 a2  b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1   1   1   2 |                   | 1   1   3   2  4 |
        | 1   2   3   4 |                   | 2   5   7   6  8 |
        | 2   1   5   6 |                   +------------------+
        | 2   2   7   8 |
        +---------------+
    Type reshape error for a list of the problem observations.
r(9);

I tried running it again, and received this output

Code:

. rename (hgsex mrcurr edhigh1) q_=
variable hgsex not found

p.s. My panel data is in long format. Stata v.15.1.

Leave a comment:

Clyde Schechter replied

29 Aug 2020, 19:20

Code:

rename (hgsex mrcurr edhigh1) q_=
reshape long @hgsex @mrcurr @edhigh1, i(id p_id) j(_j) string
drop _j
reshape wide mrcurr edhigh1, i(id p_id) j(hgsex)

Leave a comment:

Chris Boulis replied

29 Aug 2020, 18:49
Hi Clyde Schechter. Thank you for your quick response. Sorry I wasn't sure if I needed to include the id. I have respondent id (id) and partner id (p_id). And male == 1, female == 2 (i#1). Do you mind showing how I integrate id & p_id into the code please? Kind regards, Chris
Leave a comment:
Clyde Schechter replied

29 Aug 2020, 18:34
Code:

gen long pair_id = _n rename (hgsex mrcurr edhigh1) q_= reshape long @hgsex @mrcurr @edhigh1, i(pair_id) j(_j) string drop _j reshape wide mrcurr edhigh1, i(pair_id) j(hgsex)

You don't say whether 1 = Male and 2 = Female or the other way around. But, at the end of this code you will have variables mrcurr1 mrcurr2 edhigh11 and edhigh12 showing the values of mrcurr and edhigh1 for sex 1 and sex 2, respectively, in each pair.

Also, in your real data you may already have a variable that identifies the pairs. If so, you don't need to craete the variable pair_id. You can just use whatever your existing variable is wherever you see pair_id.
1 like
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: