Creating variables for husband and wife using data for respondent and their partner

Chris Boulis started a topic Creating variables for husband and wife using data for respondent and their partner

29 Aug 2020, 17:29
Creating variables for husband and wife using data for respondent and their partner
Dear Statalist.

I would like help to generate a variable, say level of education "educ" for the male partner and female partner in a union (either married or de facto). Sample from panel dataset includes data for respondent (hgsex mrcurr edhigh1) and their partner (p_hgsex p_mrcurr p_edhigh1) hgsex == 1 (male) == 2 (female) mrcurr == 1 (married) == 2 (de facto).

Help appreciated.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(hgsex p_hgsex mrcurr p_mrcurr edhigh1 p_edhigh1) 1 2 1 1 5 9 2 1 2 2 8 9 2 1 2 2 8 9 2 1 1 1 5 5 2 1 2 2 5 5 2 1 2 2 9 9 1 2 1 1 9 9 1 2 1 1 9 9 2 1 1 1 1 3 2 1 1 1 1 3 2 1 2 2 9 4 2 1 2 2 9 4 1 2 1 1 5 5 1 2 1 1 5 5 1 2 1 1 5 8 1 2 1 1 5 5 1 2 1 1 5 8 1 2 1 1 5 5 1 2 1 1 5 8 1 2 1 1 3 3 1 2 1 1 3 3 1 2 2 2 3 4 1 2 1 1 3 3 1 2 1 1 3 3 1 2 2 2 3 8 1 2 1 1 3 3 1 2 1 1 3 3 1 2 2 2 8 8 2 1 1 1 4 3 2 1 1 1 4 1 2 1 1 1 4 1 2 1 1 1 4 3 2 1 1 1 4 1 end
Last edited by Chris Boulis; 29 Aug 2020, 17:39.
Tags: None
Chris Boulis replied

12 Sep 2021, 20:54
UPDATE:

I came across the same issue again and appear to have solved it by replacing

Code:

drop if missing(hgsex, p_hgsex)

which is shown in the first line of code (in #25), with

Code:

drop if missing(hgsex)

just after the line of code"drop _j".
Last edited by Chris Boulis; 12 Sep 2021, 20:58.
Leave a comment:

Chris Boulis replied

05 Mar 2021, 23:41

Hi Clyde Schechter. I'm experiencing an issue when attempting the -reshape- command as shown in #23 applied to a different issue to that in the thread and with many additional variables - is there a limit on the number of variables? After re-reading over the thread and links, I'm not sure why I receive the following error

Code:

. reshape wide `to_fix', i(id p_id wave) j(hgsex)
variable hgsex contains missing values
r(498);

(I note the large number of variables causes this to be quite a long post): I note a couple of potential issues: (1) the high number of missing values for the gender variable, and (2) there appears an issue in the "reshape long" line, where, "adhi" is added to p_and q_ prefixes]

Code:

. drop if missing(hgsex, p_hgsex)
(199,591 observations deleted) [I checked the result from the analysis pertaining to the code in the post, and found dropping hgsex and p_hgsex resulted in (187,863 observations deleted) - which seems very high].
      
. local to_fix hgage religb relat relimp mrcurr ordfnum mrn esbrd edhigh1 edsstyp cety01 tchave tchad tcr tcyng /// 
> fmlwop hhsos anengf ancobn gh1 lssupvl pdsad losat lsclubn lstrust lshrvol lsvol lsnwmc lefrd lefnw lefni /// 
> hwassei hwassef hwfini hwfinf hwtbani hwtbanf hwsupei hwsupef hwcaini hwcainf hweqini hweqinf hwtrusi hwtrusf hwinsui hwinsuf //
. 
. rename (hgsex `to_fix') q_=

. local newvars: subinstr local to_fix " " " @", all

. local newvars @`newvars'

. reshape long @hgsex `newvars', i(id p_id wave) j(_j) string
(note: j = p_ p_edhi q_ q_edhi)
(note: p_edhihgsex not found)
(note: p_edhihgage not found)
(note: p_edhireligb not found)
(note: p_edhirelat not found)
(note: p_edhirelimp not found)
(note: p_edhimrcurr not found)
(note: p_edhiordfnum not found)
(note: p_edhimrn not found)
(note: p_edhiesbrd not found)
(note: p_edhiedhigh1 not found)
(note: p_edhiedsstyp not found)
(note: p_edhicety01 not found)
(note: p_edhitchave not found)
(note: p_edhitchad not found)
(note: p_edhitcr not found)
(note: p_edhitcyng not found)
(note: p_edhifmlwop not found)
(note: p_edhihhsos not found)
(note: p_edhianengf not found)
(note: p_edhiancobn not found)
(note: p_edhilssupvl not found)
(note: p_edhipdsad not found)
(note: p_edhilosat not found)
(note: p_edhilsclubn not found)
(note: p_edhilstrust not found)
(note: p_edhilshrvol not found)
(note: p_edhilsvol not found)
(note: p_edhilsnwmc not found)
(note: p_edhilefrd not found)
(note: p_edhilefnw not found)
(note: p_edhilefni not found)
(note: p_edhihwassei not found)
(note: p_edhihwassef not found)
(note: p_edhihwfini not found)
(note: p_edhihwfinf not found)
(note: p_edhihwtbani not found)
(note: p_edhihwtbanf not found)
(note: p_edhihwsupei not found)
(note: p_edhihwsupef not found)
(note: p_edhihwcaini not found)
(note: p_edhihwcainf not found)
(note: p_edhihweqini not found)
(note: p_edhihweqinf not found)
(note: p_edhihwtrusi not found)
(note: p_edhihwtrusf not found)
(note: p_edhihwinsui not found)
(note: p_edhihwinsuf not found)
(note: q_edhihgsex not found)
(note: q_edhihgage not found)
(note: q_edhireligb not found)
(note: q_edhirelat not found)
(note: q_edhirelimp not found)
(note: q_edhimrcurr not found)
(note: q_edhiordfnum not found)
(note: q_edhimrn not found)
(note: q_edhiesbrd not found)
(note: q_edhiedhigh1 not found)
(note: q_edhiedsstyp not found)
(note: q_edhicety01 not found)
(note: q_edhitchave not found)
(note: q_edhitchad not found)
(note: q_edhitcr not found)
(note: q_edhitcyng not found)
(note: q_edhifmlwop not found)
(note: q_edhihhsos not found)
(note: q_edhianengf not found)
(note: q_edhiancobn not found)
(note: q_edhilssupvl not found)
(note: q_edhipdsad not found)
(note: q_edhilosat not found)
(note: q_edhilsclubn not found)
(note: q_edhilstrust not found)
(note: q_edhilshrvol not found)
(note: q_edhilsvol not found)
(note: q_edhilsnwmc not found)
(note: q_edhilefrd not found)
(note: q_edhilefnw not found)
(note: q_edhilefni not found)
(note: q_edhihwassei not found)
(note: q_edhihwassef not found)
(note: q_edhihwfini not found)
(note: q_edhihwfinf not found)
(note: q_edhihwtbani not found)
(note: q_edhihwtbanf not found)
(note: q_edhihwsupei not found)
(note: q_edhihwsupef not found)
(note: q_edhihwcaini not found)
(note: q_edhihwcainf not found)
(note: q_edhihweqini not found)
(note: q_edhihweqinf not found)
(note: q_edhihwtrusi not found)
(note: q_edhihwtrusf not found)
(note: q_edhihwinsui not found)
(note: q_edhihwinsuf not found)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                    92984   ->  371936
Number of variables                 479   ->     432
j variable (4 values)                     ->   _j
xij variables:
    p_hgsex p_edhihgsex ... q_edhihgsex   ->   hgsex
    p_hgage p_edhihgage ... q_edhihgage   ->   hgage
 p_religb p_edhireligb ... q_edhireligb   ->   religb
    p_relat p_edhirelat ... q_edhirelat   ->   relat
 p_relimp p_edhirelimp ... q_edhirelimp   ->   relimp
 p_mrcurr p_edhimrcurr ... q_edhimrcurr   ->   mrcurr
p_ordfnum p_edhiordfnum ... q_edhiordfnum ->   ordfnum
          p_mrn p_edhimrn ... q_edhimrn   ->   mrn
    p_esbrd p_edhiesbrd ... q_edhiesbrd   ->   esbrd
p_edhigh1 p_edhiedhigh1 ... q_edhiedhigh1 ->   edhigh1
p_edsstyp p_edhiedsstyp ... q_edhiedsstyp ->   edsstyp
 p_cety01 p_edhicety01 ... q_edhicety01   ->   cety01
 p_tchave p_edhitchave ... q_edhitchave   ->   tchave
    p_tchad p_edhitchad ... q_edhitchad   ->   tchad
          p_tcr p_edhitcr ... q_edhitcr   ->   tcr
    p_tcyng p_edhitcyng ... q_edhitcyng   ->   tcyng
 p_fmlwop p_edhifmlwop ... q_edhifmlwop   ->   fmlwop
    p_hhsos p_edhihhsos ... q_edhihhsos   ->   hhsos
 p_anengf p_edhianengf ... q_edhianengf   ->   anengf
 p_ancobn p_edhiancobn ... q_edhiancobn   ->   ancobn
          p_gh1 p_edhigh1 ... q_edhigh1   ->   gh1
p_lssupvl p_edhilssupvl ... q_edhilssupvl ->   lssupvl
    p_pdsad p_edhipdsad ... q_edhipdsad   ->   pdsad
    p_losat p_edhilosat ... q_edhilosat   ->   losat
p_lsclubn p_edhilsclubn ... q_edhilsclubn ->   lsclubn
p_lstrust p_edhilstrust ... q_edhilstrust ->   lstrust
p_lshrvol p_edhilshrvol ... q_edhilshrvol ->   lshrvol
    p_lsvol p_edhilsvol ... q_edhilsvol   ->   lsvol
 p_lsnwmc p_edhilsnwmc ... q_edhilsnwmc   ->   lsnwmc
    p_lefrd p_edhilefrd ... q_edhilefrd   ->   lefrd
    p_lefnw p_edhilefnw ... q_edhilefnw   ->   lefnw
    p_lefni p_edhilefni ... q_edhilefni   ->   lefni
p_hwassei p_edhihwassei ... q_edhihwassei ->   hwassei
p_hwassef p_edhihwassef ... q_edhihwassef ->   hwassef
 p_hwfini p_edhihwfini ... q_edhihwfini   ->   hwfini
 p_hwfinf p_edhihwfinf ... q_edhihwfinf   ->   hwfinf
p_hwtbani p_edhihwtbani ... q_edhihwtbani ->   hwtbani
p_hwtbanf p_edhihwtbanf ... q_edhihwtbanf ->   hwtbanf
p_hwsupei p_edhihwsupei ... q_edhihwsupei ->   hwsupei
p_hwsupef p_edhihwsupef ... q_edhihwsupef ->   hwsupef
p_hwcaini p_edhihwcaini ... q_edhihwcaini ->   hwcaini
p_hwcainf p_edhihwcainf ... q_edhihwcainf ->   hwcainf
p_hweqini p_edhihweqini ... q_edhihweqini ->   hweqini
p_hweqinf p_edhihweqinf ... q_edhihweqinf ->   hweqinf
p_hwtrusi p_edhihwtrusi ... q_edhihwtrusi ->   hwtrusi
p_hwtrusf p_edhihwtrusf ... q_edhihwtrusf ->   hwtrusf
p_hwinsui p_edhihwinsui ... q_edhihwinsui ->   hwinsui
p_hwinsuf p_edhihwinsuf ... q_edhihwinsuf ->   hwinsuf
-----------------------------------------------------------------------------

. drop _j

. reshape wide `to_fix', i(id p_id wave) j(hgsex)
variable hgsex contains missing values
r(498);

Could you kindly assist with this please.

Last edited by Chris Boulis; 06 Mar 2021, 00:00.

Leave a comment:

Chris Boulis replied

08 Sep 2020, 21:20

Hi Clyde Schechter. Thank you for your efforts. I ran your code, which worked but I found the same issue, that is, only keeping data from five waves. But then I took a closer look at my code and thought there may be an issue with my line for dropping missings, which included all my variables, including "religb relimp relat". I then dropped all except "hgsex p_hgsex" and the code ran and I have observations for all waves. I now realise that it was likely the inclusion of "religb relimp relat" (in the code to drop missings) that was the cause of my issue.

I have been able to reproduce your table in #23

Code:

. list id p_id wave mrcurr1 edhigh11 esbrd1 religb1 relimp1 relat1 mrcurr2 edhigh12 esbrd2 religb2 relimp2 relat2 in 1/40, nolabel noobs sepby(id p_id)
 +-----------------------------------------------------------------------------------------------------------------------------------------------+
  |  id   p_id    wave   mrcurr1   edhigh11   esbrd1   religb1   relimp1   relat1   mrcurr2   edhigh12   esbrd2   religb2   relimp2   relat2 |
  |-----------------------------------------------------------------------------------------------------------------------------------------------|
  | 100   1063     12         2          3        1         .         .        .         2          1        1         .         .        . |
  | 100   1063     13         2          3        1         .         .        .         2          1        1         .         .        . |
  | 100   1063     14         1          3        1      2330         1        1         1          1        1      7000         5        2 |
  | 100   1063     15         1          3        1         .         .        .         1          1        1         .         .        . |
  | 100   1063     16         1          3        1         .         .        .         1          1        3         .         .        . |
  | 100   1063     17         1          3        1         .         .        .         1          1        1         .         .        . |
  | 100   1063     18         1          3        1      7000         1        1         1          1        1      7000         0        2 |
  +-----------------------------------------------------------------------------------------------------------------------------------------------+
  | 106    842     10         .          .        .         .         .        .         2          8        1         .         .        . |
  | 106    842     11         2          5        1         .         .        .         2          5        1         .         .        . |
  | 106    842     12         2          5        1         .         .        .         2          5        3         .         .        . |
  | 106    842     13         2          5        1         .         .        .         2          5        1         .         .        . |
  | 106    842     14         2          5        1      7000         0        1         2          5        1      7000         0        1 |
  | 106    842     15         1          5        1         .         .        .         1          5        1         .         .        . |
  | 106    842     16         1          5        1         .         .        .         1          5        1         .         .        . |
  | 106    842     17         1          5        1         .         .        .         1          5        3         .         .        . |
  | 106    842     18         1          5        1      7000         0        1         1          5        2      7000         0        1 |
  |-----------------------------------------------------------------------------------------------------------------------------------------------|
  | 108    109      1         1          9        3         .         .        .         1          9        3         .         .        . |
  | 108    109      2         1          9        3         .         .        .         1          9        3         .         .        . |
  | 108    109      3         1          9        3         .         .        .         1          9        3         .         .        . |
  | 108    109      4         1          9        3      2330         5        6         1          9        3      2330         6        3 |
  | 108    109      5         1          9        3         .         .        .         .          .        .         .         .        . |
  | 108    109      6         1          9        3         .         .        .         1          9        3         .         .        . |
  | 108    109      7         1          9        3         .         .        .         1          9        3      2010         5        4 |
  |-----------------------------------------------------------------------------------------------------------------------------------------------|

Thank you for your patience and help. I really appreciate it. Kind regards, Chris

Last edited by Chris Boulis; 08 Sep 2020, 21:38.

Leave a comment:

Clyde Schechter replied

08 Sep 2020, 18:27

Thanks for the example data. But I still don't see the problem. Here is my code (slightly spruced up to partially automate the handling of a larger number of variables):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave hgsex p_hgsex mrcurr p_mrcurr edhigh1 p_edhigh1 esbrd p_esbrd) int(religb p_religb) byte(relimp p_relimp relat p_relat)
106 842 11 2 1 2 2 5 5 1 1    .    . . . . .
106 842 12 2 1 2 2 5 5 3 1    .    . . . . .
106 842 13 2 1 2 2 5 5 1 1    .    . . . . .
106 842 14 2 1 2 2 5 5 1 1 7000 7000 0 0 1 1
106 842 15 2 1 1 1 5 5 1 1    .    . . . . .
106 842 16 2 1 1 1 5 5 1 1    .    . . . . .
106 842 17 2 1 1 1 5 5 3 1    .    . . . . .
106 842 18 2 1 1 1 5 5 2 1 7000 7000 0 0 1 1
108  109  1 1 2 1 1 9 9 3 3    .    . . . . .
108  109  2 1 2 1 1 9 9 3 3    .    . . . . .
108  109  3 1 2 1 1 9 9 3 3    .    . . . . .
108  109  4 1 2 1 1 9 9 3 3 2330 2330 5 6 6 3
108  109  5 1 2 1 . 9 . 3 .    .    . . . . .
108  109  6 1 2 1 1 9 9 3 3    .    . . . . .
108  109  7 1 2 1 1 9 9 3 3    . 2010 . 5 . 4
100 1063 12 2 1 2 2 1 3 1 1    .    . . . . .
100 1063 13 2 1 2 2 1 3 1 1    .    . . . . .
100 1063 14 2 1 1 1 1 3 1 1 7000 2330 5 1 2 1
100 1063 15 2 1 1 1 1 3 1 1    .    . . . . .
100 1063 16 2 1 1 1 1 3 3 1    .    . . . . .
100 1063 17 2 1 1 1 1 3 1 1    .    . . . . .
100 1063 18 2 1 1 1 1 3 1 1 7000 7000 0 1 2 1
end

local to_fix mrcurr edhigh1 esbrd religb relimp relat
rename (hgsex `to_fix') q_=
local widenames: subinstr local to_fix " " " @", all
local widenames @`widenames'

reshape long @hgsex `widenames', i(id p_id wave) j(_j) string
drop _j
reshape wide `to_fix', i(id p_id wave) j(hgsex)

list, noobs sepby(id p_id)

And what you get is:

Code:

  +-----------------------------------------------------------------------------------------------------------------------------------------+
  |  id   p_id   wave   mrcurr1   edhigh11   esbrd1   religb1   relimp1   relat1   mrcurr2   edhigh12   esbrd2   religb2   relimp2   relat2 |
  |-----------------------------------------------------------------------------------------------------------------------------------------|
  | 100   1063     12         2          3        1         .         .        .         2          1        1         .         .        . |
  | 100   1063     13         2          3        1         .         .        .         2          1        1         .         .        . |
  | 100   1063     14         1          3        1      2330         1        1         1          1        1      7000         5        2 |
  | 100   1063     15         1          3        1         .         .        .         1          1        1         .         .        . |
  | 100   1063     16         1          3        1         .         .        .         1          1        3         .         .        . |
  | 100   1063     17         1          3        1         .         .        .         1          1        1         .         .        . |
  | 100   1063     18         1          3        1      7000         1        1         1          1        1      7000         0        2 |
  |-----------------------------------------------------------------------------------------------------------------------------------------|
  | 106    842     11         2          5        1         .         .        .         2          5        1         .         .        . |
  | 106    842     12         2          5        1         .         .        .         2          5        3         .         .        . |
  | 106    842     13         2          5        1         .         .        .         2          5        1         .         .        . |
  | 106    842     14         2          5        1      7000         0        1         2          5        1      7000         0        1 |
  | 106    842     15         1          5        1         .         .        .         1          5        1         .         .        . |
  | 106    842     16         1          5        1         .         .        .         1          5        1         .         .        . |
  | 106    842     17         1          5        1         .         .        .         1          5        3         .         .        . |
  | 106    842     18         1          5        1      7000         0        1         1          5        2      7000         0        1 |
  |-----------------------------------------------------------------------------------------------------------------------------------------|
  | 108    109      1         1          9        3         .         .        .         1          9        3         .         .        . |
  | 108    109      2         1          9        3         .         .        .         1          9        3         .         .        . |
  | 108    109      3         1          9        3         .         .        .         1          9        3         .         .        . |
  | 108    109      4         1          9        3      2330         5        6         1          9        3      2330         6        3 |
  | 108    109      5         1          9        3         .         .        .         .          .        .         .         .        . |
  | 108    109      6         1          9        3         .         .        .         1          9        3         .         .        . |
  | 108    109      7         1          9        3         .         .        .         1          9        3      2010         5        4 |
  +-----------------------------------------------------------------------------------------------------------------------------------------+

What is unsatisfactory about that result? Why do you feel the need to separately treat those variables that are in every wave from those that aren't? What purpose would that serve? It seems to me that pretty much anything you want to do from here will, if properly coded, work equally will for both kinds of variables. What is it you want to do that won't work well with this as the starting point?

Leave a comment:

Chris Boulis replied

08 Sep 2020, 17:08
Hi Clyde Schechter. I amended your code to include a number of other variables from my panel data for which I want male partner/female partner variables (rather than respondent/partner variables) - some appear in all 18 waves, while some variables only appear in 5 waves of data (periodically surveyed). After I ran -reshape- I ended up with five waves for all the variables - so concluded that it must only work with the common waves for all variables, which is 5 waves. To ensure I have all the data for all waves in which the variables exist, I believe I need two -reshapes-. One for variables in all waves, and another for those variables other only in some waves. I -reshape- using "hgsex" as I want the male partner/female partner variations of each variable, but "hgsex" (the j variable) is dropped at the end of the first -reshape-. How can I get it back to run the second -reshape-?

Your original code is in #10. Below is a sample. (Note the last three variables are only in some waves):

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte(wave hgsex p_hgsex mrcurr p_mrcurr edhigh1 p_edhigh1 esbrd p_esbrd) int(religb p_religb) byte(relimp p_relimp relat p_relat) 106 842 11 2 1 2 2 5 5 1 1 . . . . . . 106 842 12 2 1 2 2 5 5 3 1 . . . . . . 106 842 13 2 1 2 2 5 5 1 1 . . . . . . 106 842 14 2 1 2 2 5 5 1 1 7000 7000 0 0 1 1 106 842 15 2 1 1 1 5 5 1 1 . . . . . . 106 842 16 2 1 1 1 5 5 1 1 . . . . . . 106 842 17 2 1 1 1 5 5 3 1 . . . . . . 106 842 18 2 1 1 1 5 5 2 1 7000 7000 0 0 1 1 108 109 1 1 2 1 1 9 9 3 3 . . . . . . 108 109 2 1 2 1 1 9 9 3 3 . . . . . . 108 109 3 1 2 1 1 9 9 3 3 . . . . . . 108 109 4 1 2 1 1 9 9 3 3 2330 2330 5 6 6 3 108 109 5 1 2 1 . 9 . 3 . . . . . . . 108 109 6 1 2 1 1 9 9 3 3 . . . . . . 108 109 7 1 2 1 1 9 9 3 3 . 2010 . 5 . 4 100 1063 12 2 1 2 2 1 3 1 1 . . . . . . 100 1063 13 2 1 2 2 1 3 1 1 . . . . . . 100 1063 14 2 1 1 1 1 3 1 1 7000 2330 5 1 2 1 100 1063 15 2 1 1 1 1 3 1 1 . . . . . . 100 1063 16 2 1 1 1 1 3 3 1 . . . . . . 100 1063 17 2 1 1 1 1 3 1 1 . . . . . . 100 1063 18 2 1 1 1 1 3 1 1 7000 7000 0 1 2 1 end

To clarify, all my variables are identified as either respondent (no prefix to variable name) or their partner (prefix "p_" added to variable name), e.g. mrcurr is respondent marital status, p_mrcurr is their partner's marital status.

Using Cox regression analysis, I want to test whether gender has an effect, which is difficult to achieve in the current structure. Using hgsex and p_hgsex, I want to convert these respondent/partner variables to male partner/female partner variables, e.g. convert mrcurr and p_mrcurr to male partner marital status (e.g. mrcurr1) and female partner marital status (e.g. mrcurr2).

Your help is appreciated.
Last edited by Chris Boulis; 08 Sep 2020, 17:26.
Leave a comment:
Clyde Schechter replied

08 Sep 2020, 10:03
I don't understand what you are describing. Please post back with example data that illustrates the problem. And also show what you want the end result to look like, as I don't understand that either.
Leave a comment:
Chris Boulis replied

08 Sep 2020, 01:37
Hi Clyde Schechter. I encountered an issue when -reshaping- as not all variables in my list are present in all waves,, some are only present in some waves (e.g. every four years) and as such, the -reshape- only occurred for those waves (not all waves). I thought I could address this issue by running two separate sets of code using -reshape-: one with variables in all waves and another with those variables only in some waves. To do this, I need the variable "hgsex" for both, but this was dropped in the first -reshape- (see code in #17).

I understand I need to use -reshape long- and that to obtain all original information (as per code in #17), I believe I would use

Code:

reshape long hgsex mrcurr edhigh1, i(id p_id wave) j(_j) string

but how would I code this such that I only obtain "hgsex". Your help is very much appreciated.
Leave a comment:
Chris Boulis replied

04 Sep 2020, 19:21
Hi Clyde Schechter. Thank you for explaining. I understand that the information is not lost it is just organised differently. The idea behind my question was that if I wanted to create other 'male partner'/'female partner' variables from existing respondent/partner variables in my data, I couldn't because the code dropped 'hgage'. However, I now understand that I can address this by expanding the list of variables I want to -reshape- in the code you provided.

UPDATE: I was successful in converting a number of other variables for which I wanted to create male/female variables in place of respondent/partner. THANK YOU very much for your help. Kind regards, Chris
Leave a comment:
Clyde Schechter replied

04 Sep 2020, 12:19
To clarify, "edhigh11" is male partner education level and "edhigh12" is female partner education level. So the values "1" (male) and "2" (female) of hgsex are transferred to the end of the variables?

Correct.

The main thing I am unclear about is that in the -reshape- process I lose the "j variable", which in this case is "hgsex" and the variables (mrcurr p_mrcurr edhigh1 p_edhigh1) as these are 'converted' using "hgsex" and replaced with edhigh11 (male partner educ level) and edhigh12 (female partner educ level) and mrcurr1 (male partner marital status) and mrcurr2 (female partner marital status). Is there a way to retain the original variables after creating the new variables?

So, you are referring to the action of -reshape wide- here; -reshape long- does the opposite.
To answer your question, my answer is that it doesn't make any sense. All of the information contained in the original variables remains: it's just organized differently. The hgsex variable's information is coded as a 1 or 2 on the end of the names of the other variables. The other variables are all still there: it's just that instead of the data being spread over two different observations, one for the male and one for the female, we now have one single observation that includes the same information with two variables for each of the original ones, one for the male and one for the female. It wouldn't make sense to have the original variables in this single observation: which version of the original variables would you keep: the ones from the male partner or the ones from the female partner? And these "original" variables would be completely redundant as they would just be exact copes of either the new male or female variables.

If at some other point in the analysis it is better to go back to the long layout with one variable per person, you can use -reshape long- to do that. -reshape-, in both directions, completely conserves information. It just re-arranges it, but nothing is lost, and no new information is created.
1 like
Leave a comment:

Chris Boulis replied

04 Sep 2020, 01:19

Thank you Clyde Schechter. Yes good points. After excluding this small group, the code worked

- Thank you very much.

Code:

. drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1)
(197,161 observations deleted)

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
(note: j = p_ q_)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                    78011   ->  156022
Number of variables                 462   ->     460
j variable (2 values)                     ->   _j
xij variables:
                        p_hgsex q_hgsex   ->   hgsex
                      p_mrcurr q_mrcurr   ->   mrcurr
                    p_edhigh1 q_edhigh1   ->   edhigh1
-----------------------------------------------------------------------------

. drop _j 

. reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)
(note: j = 1 2)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                   156022   ->   78011
Number of variables                 459   ->     459
j variable (2 values)             hgsex   ->   (dropped)
xij variables:
                                 mrcurr   ->   mrcurr1 mrcurr2
                                edhigh1   ->   edhigh11 edhigh12
-----------------------------------------------------------------------------

To clarify, "edhigh11" is male partner education level and "edhigh12" is female partner education level. So the values "1" (male) and "2" (female) of hgsex are transferred to the end of the variables?

I'm still reading (and understanding) the -reshape- command, which appears very powerful as it appears that (in the one set of code) I can include any other variable for which I want to create male and female variants. The main thing I am unclear about is that in the -reshape- process I lose the "j variable", which in this case is "hgsex" and the variables (mrcurr p_mrcurr edhigh1 p_edhigh1) as these are 'converted' using "hgsex" and replaced with edhigh11 (male partner educ level) and edhigh12 (female partner educ level) and mrcurr1 (male partner marital status) and mrcurr2 (female partner marital status). Is there a way to retain the original variables after creating the new variables?

Again, thank you so much for your help. I really appreciate it. Kind regards, Chris

Leave a comment:

Clyde Schechter replied

03 Sep 2020, 21:25
I could potentially just exclude this small group from the analysis.

That may be a reasonable approach. It really depends on your research questions. If the nature of what you're looking at is such that same-sex couples might well respond differently from opposite-sex couples, then excluding the small group of same-sex couples is the best approach, as they are too few to do a separate analysis. On the other hand, if the question is one where you would expect same-sex couples to respond the same way as opposite sex couples, then excluding them isn't such a great idea.

I think the thing to think about is what you said in #1: "I would like help to generate a variable, say level of education "educ" for the male partner and female partner in a union (either married or de facto). " Your question seems to be specifically about couples with one male and one female partner. If that was accurate, then excluding the same-sex couples from this analysis is the way to go.
1 like
Leave a comment:

Chris Boulis replied

03 Sep 2020, 18:36

Hi Clyde Schechter. Thank you for your great help. There does appear to be some same sex couples in my data set. Here's a snapshot of the output from the code in #14 (I wasn't able to get to the top to include the column headings):

Code:

        |------------------------------------------------------|
274750. | 274750      .      11819   1700376          1     17 |
274751. | 274751      .      11819   1700377          1     17 |
274752. | 274752      .      11781   1700379          1     18 |
274753. | 274753      .      11819   1700379          1     17 |
274754. | 274754      .      11781   1700382          1     18 |
        |------------------------------------------------------|
274755. | 274755      .      11819   1700382          1     17 |
274756. | 274756      .      11819   1700383          1     17 |
274757. | 274757      .      11819   1700385          1     17 |
274758. | 274758      .      11781   1700387          1     18 |
274759. | 274759      .      11819   1700387          1     17 |
        |------------------------------------------------------|
274760. | 274760      .      11781   1700388          1     18 |
274761. | 274761      .      11819   1700388          1     17 |
274762. | 274762      .      11819   1700389          1     17 |
274763. | 274763      .      11781   1700389          1     18 |
274764. | 274764      .      11781   1700390          1     18 |
        |------------------------------------------------------|
274765. | 274765      .      11819   1700390          1     17 |
274766. | 274766      .      11781   1700395          1     18 |
274767. | 274767      .      11819   1700395          1     17 |
274768. | 274768      .      11781   1700397          1     18 |
274769. | 274769      .      11819   1700397          1     17 |
        |------------------------------------------------------|

Here is a sample of same sex couples from my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave hgsex p_hgsex mrcurr p_mrcurr edhigh1 p_edhigh1)
1449 1364 13 2 2 2 2 2 1
1449 1364 14 2 2 2 2 2 1
1449 1364 15 2 2 2 2 2 1
1449 1364 16 2 2 2 2 2 1
1449 1364 17 2 2 2 2 2 1
1449 1364 18 2 2 2 2 2 1
1273  911 10 2 2 2 2 4 3
1273  911 11 2 2 2 2 4 3
1273  911 12 2 2 2 2 4 3
1273  911 13 2 2 2 2 4 3
1273  911 14 2 2 2 2 4 3
1273  911 15 2 2 2 2 4 3
1273  911 16 2 2 2 2 4 3
1273  911 17 2 2 2 2 4 3
1273  911 18 2 2 2 2 2 3
1720  603  6 2 2 2 2 5 4
1720  603  7 2 2 2 2 5 4
1106  625  6 2 2 2 2 5 5
1106  625  7 2 2 2 2 5 5
1106  625  8 2 2 2 2 5 5
1106  625  9 2 2 2 2 5 5
1106 1156 11 2 2 2 2 5 5
1106 1156 12 2 2 2 2 5 5
1106 1156 13 2 2 2 2 5 5
1454  539 10 1 1 2 2 5 4
1454  539 11 1 1 2 2 5 4
1454  539 12 1 1 2 2 5 4
1454  539 13 1 1 2 2 5 4
1454  539 14 1 1 2 2 5 4
1454  539 15 1 1 2 2 5 4
1454  539 16 1 1 2 2 5 4
1454  539 17 1 1 2 2 5 4
1454  539 18 1 1 1 1 5 4
1654 1358 10 2 2 2 2 8 8
1654 1358 11 2 2 2 2 8 8
1654 1358 12 2 2 2 2 8 8
1654 1358 13 2 2 2 2 8 8
1654 1358 14 2 2 2 2 8 8
1654 1358 15 2 2 2 2 8 8
1654 1358 16 2 2 2 2 8 8
1654 1358 17 2 2 2 2 8 8
end

I could potentially just exclude this small group from the analysis.

Last edited by Chris Boulis; 03 Sep 2020, 18:39.

Leave a comment:

Clyde Schechter replied

02 Sep 2020, 22:14

When I combine the data you gave in #9 with the code in #13, it runs without any error messages:

Code:

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input int(id p_id wave) byte(hgsex p_hgsex mrcurr p_mrcurr edhigh1 p_edhigh1)

           id      p_id      wave     hgsex   p_hgsex    mrcurr  p_mrcurr   edhigh1  p_edhi~1
  1. 108  109  1 1 2 1 1 9 9
  2. 108  109  2 1 2 1 1 9 9
  3. 108  109  3 1 2 1 1 9 9
  4. 108  109  4 1 2 1 1 9 9
  5. 108  109  5 1 2 1 . 9 .
  6. 108  109  6 1 2 1 1 9 9
  7. 108  109  7 1 2 1 1 9 9
  8. 110 163 12 2 1 2 2 1 3
  9. 110 163 13 2 1 2 2 1 3
 10. 110 163 14 2 1 1 1 1 3
 11. 110 163 15 2 1 1 1 1 3
 12. 110 163 16 2 1 1 1 1 3
 13. 110 163 17 2 1 1 1 1 3
 14. 110 163 18 2 1 1 1 1 3
 15. 114  115 10 1 2 1 1 5 5
 16. 114  115 11 1 2 1 1 5 5
 17. 114  115 12 1 2 1 1 5 5
 18. 114  115 13 1 2 1 1 5 5
 19. 114  115 14 1 2 1 1 5 5
 20. 114  115 15 1 2 1 1 5 5
 21. 114  115 16 1 2 1 1 5 5
 22. 118  119  1 2 1 1 1 4 3
 23. 118  119  2 2 1 1 1 4 3
 24. 118  119  3 2 1 1 1 4 3
 25. 118  119  4 2 1 1 1 4 1
 26. 118  119  5 2 1 1 1 4 1
 27. 118  119  6 2 1 1 1 4 1
 28. 118  119  7 2 1 1 1 4 1
 29. 118  119  8 2 1 1 1 4 1
 30. 118  119  9 2 1 1 1 4 1
 31. 118  119 10 2 1 1 1 4 1
 32. 118  119 11 2 1 1 1 4 1
 33. 118  119 12 2 1 1 1 4 1
 34. 118  119 13 2 1 1 1 4 1
 35. 118  119 14 2 1 1 1 4 1
 36. 118  119 15 2 1 1 1 4 1
 37. 118  119 16 2 1 1 1 4 1
 38. 118  119 17 2 1 1 1 4 1
 39. 123  124 10 2 1 1 1 5 9
 40. 123  124 11 2 1 1 1 5 9
 41. 123  124 12 2 1 1 1 5 9
 42. 123  124 13 2 1 1 1 5 9
 43. 123  124 14 2 1 1 1 5 9
 44. 123  124 15 2 1 1 1 5 9
 45. 123  124 16 2 1 1 1 5 9
 46. 123  124 17 2 1 1 1 5 9
 47. 123  124 18 2 1 1 1 5 9
 48. 125 185 12 2 1 2 2 9 5
 49. 125 185 13 2 1 2 2 9 5
 50. 125 185 14 2 1 2 2 9 5
 51. 126 142 15 2 1 2 2 9 9
 52. 126 142 16 2 1 2 2 9 9
 53. 126 142 17 2 1 2 2 9 9
 54. 126 142 18 2 1 2 2 9 9
 55. end

.
. drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1)
(1 observation deleted)

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
(note: j = p_ q_)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       53   ->     106
Number of variables                   9   ->       7
j variable (2 values)                     ->   _j
xij variables:
                        p_hgsex q_hgsex   ->   hgsex
                      p_mrcurr q_mrcurr   ->   mrcurr
                    p_edhigh1 q_edhigh1   ->   edhigh1
-----------------------------------------------------------------------------

. drop _j

. reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)
(note: j = 1 2)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                      106   ->      53
Number of variables                   6   ->       7
j variable (2 values)             hgsex   ->   (dropped)
xij variables:
                                 mrcurr   ->   mrcurr1 mrcurr2
                                edhigh1   ->   edhigh11 edhigh12
-----------------------------------------------------------------------------

.

The error message that you are getting says that there are situations where the same combination of id, p_id, wave, and hgsex occurs more than once in the data. This does not happen in your example. In order for it to happen, it seems to me either that your data includes some same-sex couples, or you have a situation where the same person is in relationships with two or more different partners during the same wave. If these are true not just of your data but are true facts about the survey participants, then there is a way to modify the code to accommodate that. But my worry is that, in fact, these are data errors, not facts about the survey population. So before I try to offer you different code, you should go through your data and identify any situations like this. Then find out what the ground truth for these people is. If it's correct data, post back with a data example that includes some of these. If you find out that these are data errors, then fix your data set and we're done with this.

Here's how you can identify these potential problems in the data as it stands before you run the code above:

Code:

//  IDENTIFY SAME SEX COUPLES IN THE DATA
list if hgsex == p_hgsex

//  IDENTIFY "POLYGAMOUS" COUPLINGS
keep id p_id wave
rename id q_id
gen long couple = _n
reshape long @id, i(couple) j(_j) string
by id wave, sort: gen occurrences = _N
reshape wide @id  @occurrences, i(couple) j(_j) string
list if max(p_occurrences, q_occurrences) > 1

Last edited by Clyde Schechter; 02 Sep 2020, 22:16.

Leave a comment:

Chris Boulis replied

02 Sep 2020, 20:49

Hi William Lisowski. Thank you very much for your reply. I am reading the material about -reshape- as recommended by Clyde. Yes I can see what you are referring to. My apologies I didn't realise that I had previously -group-ed mrcurr and edhigh1 (named gmrcurr, gedhigh), they are not needed and have been dropped. I did not -group- hgsex and cannot find ghgsex in my data.

New Stata output states that

There are observations within i(id p_id wave) with the same value of j(hgsex)

Could this be because the values of wave == 1 & wave == 2 are the same as hgsex == 1 (male) & hgsex == 2 (female)? That said, I notice that Clyde's code specifies that "_j" would take on "string" values so this may not be an issue. (I understand "_j" refers to hgsex mrcurr edhigh1).

Code:

. drop if missing(mrcurr, p_mrcurr, edhigh1, p_edhigh1)
(197,226 observations deleted)

. rename (hgsex mrcurr edhigh1) q_=

. reshape long @hgsex @mrcurr @edhigh1, i(id p_id wave) j(_j) string
(note: j = p_ q_)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                    78919   ->  157838
Number of variables                 393   ->     391
j variable (2 values)                     ->   _j
xij variables:
                        p_hgsex q_hgsex   ->   hgsex
                      p_mrcurr q_mrcurr   ->   mrcurr
                    p_edhigh1 q_edhigh1   ->   edhigh1
-----------------------------------------------------------------------------

. drop _j

. reshape wide mrcurr edhigh1, i(id p_id wave) j(hgsex)
(note: j = 1 2)
values of variable hgsex not unique within id p_id wave
    Your data are currently long.  You are performing a reshape wide.  You specified i(id p_id
    wave) and j(hgsex).  There are observations within i(id p_id wave) with the same value of
    j(hgsex).  In the long data, variables i() and j() together must uniquely identify the
    observations.

         long                                wide
        +---------------+                   +------------------+
        | i   j   a   b |                   | i   a1 a2  b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1   1   1   2 |                   | 1   1   3   2  4 |
        | 1   2   3   4 |                   | 2   5   7   6  8 |
        | 2   1   5   6 |                   +------------------+
        | 2   2   7   8 |
        +---------------+
    Type reshape error for a list of the problem variables.
r(9);

My understanding of the code: Line (1) renames the three variables without a prefix with prefix q_ to balance with the partner equivalent "p_". Line (2) converts data from wide to long by sex, places p_ q_ as a prefix to each varname given @ being placed as a prefix. Line (3) drops the prefixes from the variables to end up with hgsex mrcurr edhigh1 (which I believe means I lose the partner equivalents (p_hgsex p_mrcurr p_edhigh1). Line (4) converts back from long to wide by id p_id & wave by sex

As noted in #7 I have multiple waves of panel data (I did not note that it is in long form as I thought it would be assumed).

Stata v.15.1

Last edited by Chris Boulis; 02 Sep 2020, 20:55.

Announcement