Identifying partner diagnosis in a survey data

Philmon Amasalu

Join Date: May 2019
Posts: 35

Identifying partner diagnosis in a survey data

08 Feb 2025, 15:19

Dear Listers,

I am using a survey data where I want to identify the year of diagnosis of a person with partner. The data is a large survey data with large number of participants.
My intention was to identify the year of diagnosis of different disease types ( cancer, diabtes, asthma...) for either of the partners but not when both of them are diagnosed. This is due to my concer that when analysing the behavioural response of either partner ( doing more sport, reduce drinking, stop smoking, visinting the doctor more frequently), these responces might be diluted by own responces and not to the partner being diagnosed.
So, I wanted first to filter those partners and later check the senstitivity including them.

My problem is that both the personid (pid) and the partner_id(parid) columns contain both partners ( e.g, for a person with "pid==1501" for the years 2000-2010, his/her partner("parid==1502" for the same years if they stay in partnership for the whole sample) are listed in both columns. I may not describe it clearly, please execuse my english and look at the data attached

The second problem ( less problematic as I see it since my intention is to remove any partner who is diagnosed with any disease, whether the same as his/her partner or a different one), is to create columns which tell me whether the partner is diagnosed with any of the diseases, if so, if it is the same as his/her partner or not.

I tried to use the following code to reach my goal, but the outcome shows that both partners are coded 1 even when only one of them is diagnosed with a particular disease.
I would appreciate any suggestion or hint. Thank you very much for your time.

Code:

foreach disease in diabetes asthma cardiopathy cancer stroke ///
bloodp depression dementia migraine {
gen byte temp_`disease' = `disease' == 1
    
bysort pid (syear): egen year_diagnosed_`disease' = min(syear) ///
if temp_`disease' == 1
    
bysort pid (syear): egen year_diag_`disease' = min(year_diagnosed_`disease')
    
drop temp_`disease' year_diagnosed_`disease'
rename year_diag_`disease' year_diagnosed_`disease'
    
label variable year_diagnosed_`disease' "Year of `disease' diagnosis"
}

format year_diagnosed_* %9.0g

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int(pid parid syear) byte(cancer diabetes asthma)
 901  902 2000 . . .
 901  902 2001 . . .
 901  902 2002 . . .
 901  902 2003 . . .
 901  902 2004 . . .
 901  902 2005 1 . .
 901  902 2006 . . .
 901  902 2007 . . .
 901  902 2008 . . .
 901  902 2009 . . .
 901  902 2010 . . .
 902  901 2000 . . .
 902  901 2001 . . .
 902  901 2002 . . .
 902  901 2003 . . .
 902  901 2004 . . .
 902  901 2005 . . .
 902  901 2006 1 . .
 902  901 2007 . . .
 902  901 2008 . . .
 902  901 2009 . . .
 902  901 2010 . . .
1501 1502 2000 . . .
1501 1502 2001 . . .
1501 1502 2002 . . .
1501 1502 2003 . 1 .
1501 1502 2004 . . .
1501 1502 2005 . . .
1501 1502 2006 . . .
1501 1502 2007 . . .
1501 1502 2008 . . .
1501 1502 2009 . . .
1501 1502 2010 . . .
1502 1501 2000 . . .
1502 1501 2001 . . .
1502 1501 2002 . . .
1502 1501 2003 . . .
1502 1501 2004 . . .
1502 1501 2005 . . .
1502 1501 2006 . . .
1502 1501 2007 . . .
1502 1501 2008 . . 1
1502 1501 2009 . . .
1502 1501 2010 . . .
3214 3215 2000 . . .
3214 3215 2001 . . .
3214 3215 2002 . . .
3214 3215 2003 . . .
3214 3215 2004 . . .
3214 3215 2005 . . .
3214 3215 2006 . . .
3214 3215 2007 . . .
3214 3215 2008 . . .
3214 3215 2009 . . .
3214 3215 2010 . . .
3215 3214 2000 . . .
3215 3214 2001 . . .
3215 3214 2002 . . .
3215 3214 2003 . . .
3215 3214 2004 . . 1
3215 3214 2005 . . .
3215 3214 2006 . . .
3215 3214 2007 . . .
3215 3214 2008 . . .
3215 3214 2009 . . .
3215 3214 2010 . . .
3540 3541 2000 . . .
3540 3541 2001 . . .
3540 3541 2002 . . .
3540 3541 2003 . . .
3540 3541 2004 . . .
3540 3541 2005 . 1 .
3540 3541 2006 . . .
3540 3541 2007 . . .
3540 3541 2008 . . .
3540 3541 2009 . . .
3540 3541 2010 . . .
3541 3540 2000 . . .
3541 3540 2001 . . .
3541 3540 2002 . . .
3541 3540 2003 . . .
3541 3540 2004 . . .
3541 3540 2005 . . .
3541 3540 2006 . . .
3541 3540 2007 . . .
3541 3540 2008 . . .
3541 3540 2009 . . .
3541 3540 2010 . . .
end

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#2

08 Feb 2025, 16:00

I did not understand the specific end results you are looking for, nor could I discern it from your code attempts. What I can offer you is code to reorganize the data in a way that puts the data in such a way that it is easy to identify when both members of a partnership pair have the same disease, or any disease. Perhaps you can finish it from there.

Code:

local diseases cancer diabetes asthma // CHANGE DISEASES TO 0/1 CODING mvencode `diseases', mv(0) // RE-ORGANIZE DATA TO LINK PERSON AND PARTNER'S INFORMATION frame put pid syear `diseases', into(partners) frlink m:1 parid syear, frame(partners pid syear) frget `diseases', from(partners) prefix(par_) drop partners frame drop partners foreach d of local diseases { order par_`d', after(`d') by pid (syear), sort: gen byte has_`d' = sum(`d'), after(`d') by pid (syear): gen byte par_has_`d' = sum(par_`d'), after(par_`d') } egen byte has_any_disease = rowmax(has_*) egen byte par_has_any_disease = rowmax(par_has_*)

If you want further assistance, when posting back I suggest that you work up a small example of how you want the end result to look and show that. That is often clearer than anything achievable in words.
Comment
Philmon Amasalu

Join Date: May 2019

Posts: 35
#3

08 Feb 2025, 16:30

Dear Clyde,

Thank you very much for your quick help.
Your code pretty much took me where I want.

If you want further assistance, when posting back I suggest that you work up a small example of how you want the end result to look and show that. That is often clearer than anything achievable in words.

. Apologie, I wouldn't have got any responce if it were not for you. I will do that next time I post.
Thank you once again.
Comment
Philmon Amasalu

Join Date: May 2019

Posts: 35
#4

13 Feb 2025, 05:49

Dear Clyde,

I am sorry. I came back since I got stuck. I next wanted to identify the partner pairs with only one of them are diagnosed (so that I can see how the non-diagnosed partner reacts behaviourally to his/her partner pair).
When I use the code

Code:

bysort pid: gen wanted= has_any_disease==1& par_has_any_disease==0

or

Code:

bysort pid (syear): gen wanted= has_any_disease==1& par_has_any_disease==0

.
Infact, I did not understand why these two codes generate the same column
It is checking for every year, but I want it to check by

pid

.
Thank you for your help as always.

Last edited by Philmon Amasalu; 13 Feb 2025, 06:01.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1499
#5

13 Feb 2025, 08:07

The only difference between those two lines of code is in defining how the data is sorted. But the expression that decides how wanted is created operates on one row at a time, and thus for one combination of pid and syear at a time.

Instead of these, you might want to do:

Code:

bysort pid (syear): egen byte wanted = max(has_any_disease == 1 & par_has_any_disease == 0)
Comment

Announcement

Identifying partner diagnosis in a survey data

Comment

Comment

Comment

Comment