Creating variables for husband and wife using data for respondent and their partner

Chris Boulis

Join Date: Feb 2019

Posts: 368
#1

Creating variables for husband and wife using data for respondent and their partner

29 Aug 2020, 17:29

Dear Statalist.

I would like help to generate a variable, say level of education "educ" for the male partner and female partner in a union (either married or de facto). Sample from panel dataset includes data for respondent (hgsex mrcurr edhigh1) and their partner (p_hgsex p_mrcurr p_edhigh1) hgsex == 1 (male) == 2 (female) mrcurr == 1 (married) == 2 (de facto).

Help appreciated.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(hgsex p_hgsex mrcurr p_mrcurr edhigh1 p_edhigh1) 1 2 1 1 5 9 2 1 2 2 8 9 2 1 2 2 8 9 2 1 1 1 5 5 2 1 2 2 5 5 2 1 2 2 9 9 1 2 1 1 9 9 1 2 1 1 9 9 2 1 1 1 1 3 2 1 1 1 1 3 2 1 2 2 9 4 2 1 2 2 9 4 1 2 1 1 5 5 1 2 1 1 5 5 1 2 1 1 5 8 1 2 1 1 5 5 1 2 1 1 5 8 1 2 1 1 5 5 1 2 1 1 5 8 1 2 1 1 3 3 1 2 1 1 3 3 1 2 2 2 3 4 1 2 1 1 3 3 1 2 1 1 3 3 1 2 2 2 3 8 1 2 1 1 3 3 1 2 1 1 3 3 1 2 2 2 8 8 2 1 1 1 4 3 2 1 1 1 4 1 2 1 1 1 4 1 2 1 1 1 4 3 2 1 1 1 4 1 end

Last edited by Chris Boulis; 29 Aug 2020, 17:39.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

29 Aug 2020, 18:34

Code:

gen long pair_id = _n rename (hgsex mrcurr edhigh1) q_= reshape long @hgsex @mrcurr @edhigh1, i(pair_id) j(_j) string drop _j reshape wide mrcurr edhigh1, i(pair_id) j(hgsex)

You don't say whether 1 = Male and 2 = Female or the other way around. But, at the end of this code you will have variables mrcurr1 mrcurr2 edhigh11 and edhigh12 showing the values of mrcurr and edhigh1 for sex 1 and sex 2, respectively, in each pair.

Also, in your real data you may already have a variable that identifies the pairs. If so, you don't need to craete the variable pair_id. You can just use whatever your existing variable is wherever you see pair_id.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#3

29 Aug 2020, 18:49

Hi Clyde Schechter. Thank you for your quick response. Sorry I wasn't sure if I needed to include the id. I have respondent id (id) and partner id (p_id). And male == 1, female == 2 (i#1). Do you mind showing how I integrate id & p_id into the code please? Kind regards, Chris
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

29 Aug 2020, 19:20

Code:

rename (hgsex mrcurr edhigh1) q_=
reshape long @hgsex @mrcurr @edhigh1, i(id p_id) j(_j) string
drop _j
reshape wide mrcurr edhigh1, i(id p_id) j(hgsex)

Comment

Chris Boulis

Join Date: Feb 2019
Posts: 368

29 Aug 2020, 20:19

Thank you Clyde Schechter. I ran the code and Stata provided the following output:

Code:

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id) j(_j) string
(note: j = g p_ q_)

variable id does not uniquely identify the observations
    Your data are currently wide.  You are performing a reshape long.  You specified i(id p_id) and j(_j).  In the
    current wide form, variable id p_id should uniquely identify the observations.  Remember this picture:

         long                                wide
        +---------------+                   +------------------+
        | i   j   a   b |                   | i   a1 a2  b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1   1   1   2 |                   | 1   1   3   2  4 |
        | 1   2   3   4 |                   | 2   5   7   6  8 |
        | 2   1   5   6 |                   +------------------+
        | 2   2   7   8 |
        +---------------+
    Type reshape error for a list of the problem observations.
r(9);

I tried running it again, and received this output

Code:

. rename (hgsex mrcurr edhigh1) q_=
variable hgsex not found

p.s. My panel data is in long format. Stata v.15.1.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

29 Aug 2020, 21:04

So the message is telling you that you have multiple observations with the same id and p_id. Now you are referring to panel data, which you did not mention before. So I guess that the multiple observations for the same couple refer to different time periods. So perhaps you have a date variable, or a wave number or something like that which, combined with id and p_id uniquely identify the observations in your data. You need to include that variable along with id and p_id in the -i()- option of both -reshape- commands.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 368
#7

29 Aug 2020, 22:30

Hi Clyde Schechter. I noted I had panel data at end of #5 but maybe it wasn't clear. I have multiple waves of panel data and have id p_id and wave variables that uniquely identify the observations. Could you please explain what your code is doing?

I am after two variables, one for education level of the male partner (e.g. husband) and one for educ level of the female partner (e.g. wife)? I would like to compare the effects of couples with similar levels of education with couples with very different levels of education to determine if gender has a significant effect on relationship dissolution. e.g. effect of male partner's level of educ = female partner's level of educ compared to that when male partner's educ > female's educ or when the female partner's educ > male partner's educ.

Given your comments I ran the following:

Code:

rename (hgsex mrcurr edhigh1) q_= reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string drop _j reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)

And received the following from Stata:

Code:

. rename (hgsex mrcurr edhigh1) q_= . reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string (note: j = g p_ q_) (note: ghgsex not found) (note: gedhigh1 not found) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 276145 -> 828435 Number of variables 466 -> 463 j variable (3 values) -> _j xij variables: ghgsex p_hgsex q_hgsex -> hgsex gmrcurr p_mrcurr q_mrcurr -> mrcurr gedhigh1 p_edhigh1 q_edhigh1 -> edhigh1 ----------------------------------------------------------------------------- . drop _j . reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex) variable hgsex contains missing values r(498);

Last edited by Chris Boulis; 29 Aug 2020, 22:34.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

30 Aug 2020, 11:07

Please use -dataex- to post some example data that reproduces this problem. It does not occur with the examples you have shown so far.

That said, the likely causes of this are either there are some couples where hgsex or p_hgsex is missing in the original data. Or it may be that you have some "couples" that consist of only a single person
1 like
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 368

01 Sep 2020, 04:54

Hi Clyde Schechter. i'm not sure what the issue is, in part because I don't fully understand what your code is doing. I notice that your line of code to rename variables only includes the variables for the respondent (sex, marital status and education level) but not the partner (which are the same but contain the prefix "p_"). That said, you may already understand as I noted it in #1.

Further, in reviewing a larger sample of my data (20,000 observations), I noticed there were some missing data for the partner (p_mrcurr & p_edhigh1). I noticed that some respondents change partners over time. If you think either or both of these issues will cause a problem can we include code to exclude couples with missing data and only use the data for couples where the id remains the same? Can the code deal with increasing levels of education in couples? Finally, to ensure only couples are included, could we not add an if statement such as?

Code:

.... if inlist(mrcurr, 1, 2) & inlist(p_mrcurr, 1, 2)

In case it helps, here's another sample.

Code:

108  109  1 1 2 1 1 9 9
108  109  2 1 2 1 1 9 9
108  109  3 1 2 1 1 9 9
108  109  4 1 2 1 1 9 9
108  109  5 1 2 1 . 9 .
108  109  6 1 2 1 1 9 9
108  109  7 1 2 1 1 9 9
110 163 12 2 1 2 2 1 3
110 163 13 2 1 2 2 1 3
110 163 14 2 1 1 1 1 3
110 163 15 2 1 1 1 1 3
110 163 16 2 1 1 1 1 3
110 163 17 2 1 1 1 1 3
110 163 18 2 1 1 1 1 3
114  115 10 1 2 1 1 5 5
114  115 11 1 2 1 1 5 5
114  115 12 1 2 1 1 5 5
114  115 13 1 2 1 1 5 5
114  115 14 1 2 1 1 5 5
114  115 15 1 2 1 1 5 5
114  115 16 1 2 1 1 5 5
118  119  1 2 1 1 1 4 3
118  119  2 2 1 1 1 4 3
118  119  3 2 1 1 1 4 3
118  119  4 2 1 1 1 4 1
118  119  5 2 1 1 1 4 1
118  119  6 2 1 1 1 4 1
118  119  7 2 1 1 1 4 1
118  119  8 2 1 1 1 4 1
118  119  9 2 1 1 1 4 1
118  119 10 2 1 1 1 4 1
118  119 11 2 1 1 1 4 1
118  119 12 2 1 1 1 4 1
118  119 13 2 1 1 1 4 1
118  119 14 2 1 1 1 4 1
118  119 15 2 1 1 1 4 1
118  119 16 2 1 1 1 4 1
118  119 17 2 1 1 1 4 1
123  124 10 2 1 1 1 5 9
123  124 11 2 1 1 1 5 9
123  124 12 2 1 1 1 5 9
123  124 13 2 1 1 1 5 9
123  124 14 2 1 1 1 5 9
123  124 15 2 1 1 1 5 9
123  124 16 2 1 1 1 5 9
123  124 17 2 1 1 1 5 9
123  124 18 2 1 1 1 5 9
125 185 12 2 1 2 2 9 5
125 185 13 2 1 2 2 9 5
125 185 14 2 1 2 2 9 5
126 142 15 2 1 2 2 9 9
126 142 16 2 1 2 2 9 9
126 142 17 2 1 2 2 9 9
126 142 18 2 1 2 2 9 9
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

01 Sep 2020, 11:59

i'm not sure what the issue is, in part because I don't fully understand what your code is doing.

Code:

rename (hgsex mrcurr edhigh1) q_= reshape long @hgsex @mrcurr @edhigh1, i(id p_id) j(_j) string drop _j reshape wide mrcurr edhigh1, i(id p_id) j(hgsex)

The -rename- command is preparation for the -reshape- command. Please read the PDF manual section on the -reshape- command: it will prove indispensable to you in managing this kind of data. Anyway, the names of the variables being reshaped have to have parallel structure. Yours don't because one member of the pair is prefixed with p_ and the other has no prefix. So the rename command makes them similar by giving a q_ prefix to the ones with no prefix. The choice of q_ was arbitrary and is of no importance. In fact, later, the p_ and q_ are both dropped in the -drop _j- command.

I noticed there were some missing data for the partner (p_mrcurr & p_edhigh1). I noticed that some respondents change partners over time. If you think either or both of these issues will cause a problem can we include code to exclude couples with missing data and only use the data for couples where the id remains the same?

Just add this line of code at the bgeginning

Code:

drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1)

Since each observation is a single partnership, there is no clear reason to eliminate those where the respondents change partners over time, at least for present purposes. It may be that for later analysis you will need to do that, but it isn't a problem for the immediate situation.
1 like
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 368

#11

02 Sep 2020, 04:43

Hi Clyde Schechter. Thank you for your explanation and advice. The code still does not run (output below states hgsex contains missing values even though I included this in the first line of code). I'm not sure what is wrong. I will read the PDF on -reshape-.

The code drops variables _j , which I believe includes q_ and p_ of hgsex, mrcurr and edhigh1, it effects other code in this file that includes those variables. Can we not drop _j or Is there another approach to changing respondent/partner variables to male partner/female partner variables or is this the best way?

Also, if the code did run, what would the newly created variables for "male partner education level" and "female partner education level" be called? Thanks for your help. Kind regards, Chris

Code:

. drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1, hgsex, p_hgsex)
(197,226 observations deleted)

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
(note: j = g p_ q_)
(note: ghgsex not found)
(note: gedhigh1 not found)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                    78919   ->  236757
Number of variables                 474   ->     471
j variable (3 values)                     ->   _j
xij variables:
                 ghgsex p_hgsex q_hgsex   ->   hgsex
              gmrcurr p_mrcurr q_mrcurr   ->   mrcurr
           gedhigh1 p_edhigh1 q_edhigh1   ->   edhigh1
-----------------------------------------------------------------------------

. drop _j

. reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)
variable hgsex contains missing values
r(498);

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#12

02 Sep 2020, 07:31

Had you read the documentation for reshape as Clyde suggested, you would learn that in your output the material I have colored blue and red

Code:

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string (note: j = g p_ q_) (note: ghgsex not found) (note: gedhigh1 not found) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 78919 -> 236757 Number of variables 474 -> 471 j variable (3 values) -> _j xij variables: ghgsex p_hgsex q_hgsex -> hgsex gmrcurr p_mrcurr q_mrcurr -> mrcurr gedhigh1 p_edhigh1 q_edhigh1 -> edhigh1 -----------------------------------------------------------------------------

include the appearance of gmrcurr which indicates that you have a variable in your data that ends with mrcurr that you have not previously mentioned in this topic, that Clyde overlooked in your post #7, and for which you did not provide example data that reproduces your problem, as Clyde requested in post #8.

The result is that the reshape command tried to create observations of not only for the prefixes p_ and q_ but also for the prefix g, and (since ghgsex and gedhigh1 do not exist) filled hgsex and edhigh1 with missing values for _j=="g".

For now, my advice is to add

Code:

rename gmrcurr gmrcurrX

before your rehshape command so that only p_mrcurr and q_mrcurr will be reshaped.
3 likes
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 368

#13

02 Sep 2020, 20:49

Hi William Lisowski. Thank you very much for your reply. I am reading the material about -reshape- as recommended by Clyde. Yes I can see what you are referring to. My apologies I didn't realise that I had previously -group-ed mrcurr and edhigh1 (named gmrcurr, gedhigh), they are not needed and have been dropped. I did not -group- hgsex and cannot find ghgsex in my data.

New Stata output states that

There are observations within i(id p_id wave) with the same value of j(hgsex)

Could this be because the values of wave == 1 & wave == 2 are the same as hgsex == 1 (male) & hgsex == 2 (female)? That said, I notice that Clyde's code specifies that "_j" would take on "string" values so this may not be an issue. (I understand "_j" refers to hgsex mrcurr edhigh1).

Code:

. drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1)
(197,226 observations deleted)

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
(note: j = p_ q_)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                    78919   ->  157838
Number of variables                 393   ->     391
j variable (2 values)                     ->   _j
xij variables:
                        p_hgsex q_hgsex   ->   hgsex
                      p_mrcurr q_mrcurr   ->   mrcurr
                    p_edhigh1 q_edhigh1   ->   edhigh1
-----------------------------------------------------------------------------

. drop _j

. reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)
(note: j = 1 2)
values of variable hgsex not unique within id p_id wave
    Your data are currently long.  You are performing a reshape wide.  You specified i(id p_id
    wave) and j(hgsex).  There are observations within i(id p_id wave) with the same value of
    j(hgsex).  In the long data, variables i() and j() together must uniquely identify the
    observations.

         long                                wide
        +---------------+                   +------------------+
        | i   j   a   b |                   | i   a1 a2  b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1   1   1   2 |                   | 1   1   3   2  4 |
        | 1   2   3   4 |                   | 2   5   7   6  8 |
        | 2   1   5   6 |                   +------------------+
        | 2   2   7   8 |
        +---------------+
    Type reshape error for a list of the problem variables.
r(9);

My understanding of the code: Line (1) renames the three variables without a prefix with prefix q_ to balance with the partner equivalent "p_". Line (2) converts data from wide to long by sex, places p_ q_ as a prefix to each varname given @ being placed as a prefix. Line (3) drops the prefixes from the variables to end up with hgsex mrcurr edhigh1 (which I believe means I lose the partner equivalents (p_hgsex p_mrcurr p_edhigh1). Line (4) converts back from long to wide by id p_id & wave by sex

As noted in #7 I have multiple waves of panel data (I did not note that it is in long form as I thought it would be assumed).

Stata v.15.1

Last edited by Chris Boulis; 02 Sep 2020, 20:55.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

#14

02 Sep 2020, 22:14

When I combine the data you gave in #9 with the code in #13, it runs without any error messages:

Code:

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input int(id p_id wave) byte(hgsex p_hgsex mrcurr p_mrcurr edhigh1 p_edhigh1)

           id      p_id      wave     hgsex   p_hgsex    mrcurr  p_mrcurr   edhigh1  p_edhi~1
  1. 108  109  1 1 2 1 1 9 9
  2. 108  109  2 1 2 1 1 9 9
  3. 108  109  3 1 2 1 1 9 9
  4. 108  109  4 1 2 1 1 9 9
  5. 108  109  5 1 2 1 . 9 .
  6. 108  109  6 1 2 1 1 9 9
  7. 108  109  7 1 2 1 1 9 9
  8. 110 163 12 2 1 2 2 1 3
  9. 110 163 13 2 1 2 2 1 3
 10. 110 163 14 2 1 1 1 1 3
 11. 110 163 15 2 1 1 1 1 3
 12. 110 163 16 2 1 1 1 1 3
 13. 110 163 17 2 1 1 1 1 3
 14. 110 163 18 2 1 1 1 1 3
 15. 114  115 10 1 2 1 1 5 5
 16. 114  115 11 1 2 1 1 5 5
 17. 114  115 12 1 2 1 1 5 5
 18. 114  115 13 1 2 1 1 5 5
 19. 114  115 14 1 2 1 1 5 5
 20. 114  115 15 1 2 1 1 5 5
 21. 114  115 16 1 2 1 1 5 5
 22. 118  119  1 2 1 1 1 4 3
 23. 118  119  2 2 1 1 1 4 3
 24. 118  119  3 2 1 1 1 4 3
 25. 118  119  4 2 1 1 1 4 1
 26. 118  119  5 2 1 1 1 4 1
 27. 118  119  6 2 1 1 1 4 1
 28. 118  119  7 2 1 1 1 4 1
 29. 118  119  8 2 1 1 1 4 1
 30. 118  119  9 2 1 1 1 4 1
 31. 118  119 10 2 1 1 1 4 1
 32. 118  119 11 2 1 1 1 4 1
 33. 118  119 12 2 1 1 1 4 1
 34. 118  119 13 2 1 1 1 4 1
 35. 118  119 14 2 1 1 1 4 1
 36. 118  119 15 2 1 1 1 4 1
 37. 118  119 16 2 1 1 1 4 1
 38. 118  119 17 2 1 1 1 4 1
 39. 123  124 10 2 1 1 1 5 9
 40. 123  124 11 2 1 1 1 5 9
 41. 123  124 12 2 1 1 1 5 9
 42. 123  124 13 2 1 1 1 5 9
 43. 123  124 14 2 1 1 1 5 9
 44. 123  124 15 2 1 1 1 5 9
 45. 123  124 16 2 1 1 1 5 9
 46. 123  124 17 2 1 1 1 5 9
 47. 123  124 18 2 1 1 1 5 9
 48. 125 185 12 2 1 2 2 9 5
 49. 125 185 13 2 1 2 2 9 5
 50. 125 185 14 2 1 2 2 9 5
 51. 126 142 15 2 1 2 2 9 9
 52. 126 142 16 2 1 2 2 9 9
 53. 126 142 17 2 1 2 2 9 9
 54. 126 142 18 2 1 2 2 9 9
 55. end

.
. drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1)
(1 observation deleted)

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
(note: j = p_ q_)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       53   ->     106
Number of variables                   9   ->       7
j variable (2 values)                     ->   _j
xij variables:
                        p_hgsex q_hgsex   ->   hgsex
                      p_mrcurr q_mrcurr   ->   mrcurr
                    p_edhigh1 q_edhigh1   ->   edhigh1
-----------------------------------------------------------------------------

. drop _j

. reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)
(note: j = 1 2)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                      106   ->      53
Number of variables                   6   ->       7
j variable (2 values)             hgsex   ->   (dropped)
xij variables:
                                 mrcurr   ->   mrcurr1 mrcurr2
                                edhigh1   ->   edhigh11 edhigh12
-----------------------------------------------------------------------------

.

The error message that you are getting says that there are situations where the same combination of id, p_id, wave, and hgsex occurs more than once in the data. This does not happen in your example. In order for it to happen, it seems to me either that your data includes some same-sex couples, or you have a situation where the same person is in relationships with two or more different partners during the same wave. If these are true not just of your data but are true facts about the survey participants, then there is a way to modify the code to accommodate that. But my worry is that, in fact, these are data errors, not facts about the survey population. So before I try to offer you different code, you should go through your data and identify any situations like this. Then find out what the ground truth for these people is. If it's correct data, post back with a data example that includes some of these. If you find out that these are data errors, then fix your data set and we're done with this.

Here's how you can identify these potential problems in the data as it stands before you run the code above:

Code:

//  IDENTIFY SAME SEX COUPLES IN THE DATA
list if hgsex == p_hgsex

//  IDENTIFY "POLYGAMOUS" COUPLINGS
keep id p_id wave
rename id q_id
gen long couple = _n
reshape long @id, i(couple) j(_j) string
by id wave, sort: gen occurrences = _N
reshape wide @id  @occurrences, i(couple) j(_j) string
list if max(p_occurrences, q_occurrences) > 1

Last edited by Clyde Schechter; 02 Sep 2020, 22:16.

Comment

Chris Boulis

Join Date: Feb 2019
Posts: 368

#15

03 Sep 2020, 18:36

Hi Clyde Schechter. Thank you for your great help. There does appear to be some same sex couples in my data set. Here's a snapshot of the output from the code in #14 (I wasn't able to get to the top to include the column headings):

Code:

        |------------------------------------------------------|
274750. | 274750      .      11819   1700376          1     17 |
274751. | 274751      .      11819   1700377          1     17 |
274752. | 274752      .      11781   1700379          1     18 |
274753. | 274753      .      11819   1700379          1     17 |
274754. | 274754      .      11781   1700382          1     18 |
        |------------------------------------------------------|
274755. | 274755      .      11819   1700382          1     17 |
274756. | 274756      .      11819   1700383          1     17 |
274757. | 274757      .      11819   1700385          1     17 |
274758. | 274758      .      11781   1700387          1     18 |
274759. | 274759      .      11819   1700387          1     17 |
        |------------------------------------------------------|
274760. | 274760      .      11781   1700388          1     18 |
274761. | 274761      .      11819   1700388          1     17 |
274762. | 274762      .      11819   1700389          1     17 |
274763. | 274763      .      11781   1700389          1     18 |
274764. | 274764      .      11781   1700390          1     18 |
        |------------------------------------------------------|
274765. | 274765      .      11819   1700390          1     17 |
274766. | 274766      .      11781   1700395          1     18 |
274767. | 274767      .      11819   1700395          1     17 |
274768. | 274768      .      11781   1700397          1     18 |
274769. | 274769      .      11819   1700397          1     17 |
        |------------------------------------------------------|

Here is a sample of same sex couples from my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave hgsex p_hgsex mrcurr p_mrcurr edhigh1 p_edhigh1)
1449 1364 13 2 2 2 2 2 1
1449 1364 14 2 2 2 2 2 1
1449 1364 15 2 2 2 2 2 1
1449 1364 16 2 2 2 2 2 1
1449 1364 17 2 2 2 2 2 1
1449 1364 18 2 2 2 2 2 1
1273  911 10 2 2 2 2 4 3
1273  911 11 2 2 2 2 4 3
1273  911 12 2 2 2 2 4 3
1273  911 13 2 2 2 2 4 3
1273  911 14 2 2 2 2 4 3
1273  911 15 2 2 2 2 4 3
1273  911 16 2 2 2 2 4 3
1273  911 17 2 2 2 2 4 3
1273  911 18 2 2 2 2 2 3
1720  603  6 2 2 2 2 5 4
1720  603  7 2 2 2 2 5 4
1106  625  6 2 2 2 2 5 5
1106  625  7 2 2 2 2 5 5
1106  625  8 2 2 2 2 5 5
1106  625  9 2 2 2 2 5 5
1106 1156 11 2 2 2 2 5 5
1106 1156 12 2 2 2 2 5 5
1106 1156 13 2 2 2 2 5 5
1454  539 10 1 1 2 2 5 4
1454  539 11 1 1 2 2 5 4
1454  539 12 1 1 2 2 5 4
1454  539 13 1 1 2 2 5 4
1454  539 14 1 1 2 2 5 4
1454  539 15 1 1 2 2 5 4
1454  539 16 1 1 2 2 5 4
1454  539 17 1 1 2 2 5 4
1454  539 18 1 1 1 1 5 4
1654 1358 10 2 2 2 2 8 8
1654 1358 11 2 2 2 2 8 8
1654 1358 12 2 2 2 2 8 8
1654 1358 13 2 2 2 2 8 8
1654 1358 14 2 2 2 2 8 8
1654 1358 15 2 2 2 2 8 8
1654 1358 16 2 2 2 2 8 8
1654 1358 17 2 2 2 2 8 8
end

I could potentially just exclude this small group from the analysis.

Last edited by Chris Boulis; 03 Sep 2020, 18:39.

Announcement