How to count each binary variables and only generate one variable to include all count information?

Geralt Ji

Join Date: May 2020

Posts: 27
#1

How to count each binary variables and only generate one variable to include all count information?

29 Jun 2021, 02:26

Hi there,

I have a set of binary variables and each of them stands for a certain health disease (e.g. heart = heart disease, dep = depression, asth = asthma...).
I also have a variable phi with three values (1, 2, 3) which means three different kind of insurance.

Currently, I want to look at the three insurance distribution, by diseases

I wonder:

1. How to get the number of patients for each disease and put the count information into a new variable count? Or is there an easier way to show how many patients there are for each disease? The below code I tried will overwrite the values of former disease if the respondent has two or more diseases. It cannot calculate the number of patient with a certain disease correctly.

Code:

gen diagnosed=. replace diagnosed = 1 if heart == 1 replace diagnosed = 2 if dep == 2 replace diagnosed = 3 if asth == 3 ....

2. How to get the insurance distribution by diseases? e.g. for patients with each disease, how many people buy 1st insurance, how many people buy 2nd insurance?

Thanks in advance!

Last edited by Geralt Ji; 29 Jun 2021, 02:30.
Tags: None
Geralt Ji

Join Date: May 2020

Posts: 27
#2

29 Jun 2021, 04:45

Sorry I wrote wrong codes example. The below codes are the ones I tried. I tried to generate a new variable diagnosed and want to use tab command to show the number of patients for each disease. But it will encounter "overwrite" problem as I said in #1

Code:

gen diagnosed=. replace diagnosed = 1 if heart == 1 replace diagnosed = 2 if dep == 1 replace diagnosed = 3 if asth == 1 .... label define diagnosedl 1 "heart diseases" 2 "depression" 3"asthma".... label value diagnosed diagnosedl

Last edited by Geralt Ji; 29 Jun 2021, 04:48.
Comment

Richard Williams

Join Date: Apr 2014
Posts: 5025

29 Jun 2021, 08:05

This is klutzy but I think it works. Perhaps there is a simpler way.

Code:

webuse nhanes2f, clear
preserve
gen idnum = _n
expand 3
bysort idnum: gen recnum = _n

gen diagnosed=.
replace diagnosed = 1 if heartatk == 1 & recnum == 1
replace diagnosed = 2 if diabetes  == 1 & recnum == 2
replace diagnosed = 3 if highbp == 1 & recnum == 3
....
label define diagnosedl 1 "heart attack" 2 "diabetes" 3 "high blood pressure"
label value diagnosed diagnosedl
tab2 diagnosed race
restore

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/

Comment

Geralt Ji

Join Date: May 2020
Posts: 27

29 Jun 2021, 11:02

Originally posted by Richard Williams View Post

This is klutzy but I think it works. Perhaps there is a simpler way.

Code:

webuse nhanes2f, clear
preserve
gen idnum = _n
expand 3
bysort idnum: gen recnum = _n

gen diagnosed=.
replace diagnosed = 1 if heartatk == 1 & recnum == 1
replace diagnosed = 2 if diabetes == 1 & recnum == 2
replace diagnosed = 3 if highbp == 1 & recnum == 3
....
label define diagnosedl 1 "heart attack" 2 "diabetes" 3 "high blood pressure"
label value diagnosed diagnosedl
tab2 diagnosed race
restore

Hi Richard,

Thanks for your reply! I tried your code and got right number !

Code:

.....
preserve
gen idnum = _n
expand 11 // there are a total of 11 kind of diseases
bysort idnum: gen recnum = _n

gen diagnosed = .
replace diagnosed = 1 if heart == 1 & recnum == 1 // heart diseases
replace diagnosed = 2 if asth == 1 & recnum == 2 // asthama
replace diagnosed = 3 if cancer == 1 & recnum == 3 // cancer
......
replace diagnosed = 10 if dep == 1 & recnum == 10 // Depression or anxiety
replace diagnosed = 11 if mental== 1 & recnum == 11 // other mental illness
tab2 diagnosed phi
restore

Last edited by Geralt Ji; 29 Jun 2021, 11:23.

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5025
#5

29 Jun 2021, 11:22

Does phi have missing cases that get dropped in the tab2 command?

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Geralt Ji

Join Date: May 2020

Posts: 27
#6

29 Jun 2021, 11:29

Hi there,

I found that diagnosed variable is not stored in the dataset. I'm also confused how to keep diagnosed variable? Because I want to use it as independent variables to plot a graph to show the insurance status, by diseases.

Here is what I'm going to plot...

Code:

graph bar phistatus_1 phistatus_2 phistatus_3 , over(diagnosed)
Comment
Geralt Ji

Join Date: May 2020

Posts: 27
#7

29 Jun 2021, 11:31

Originally posted by Richard Williams View Post

Does phi have missing cases that get dropped in the tab2 command?

Yes! Very sorry for inconvenience. It does have some missing values.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#8

29 Jun 2021, 12:01

Originally posted by Geralt Ji View Post

Hi there,

I found that diagnosed variable is not stored in the dataset. I'm also confused how to keep diagnosed variable? Because I want to use it as independent variables to plot a graph to show the insurance status, by diseases.

Here is what I'm going to plot...

Code:

graph bar phistatus_1 phistatus_2 phistatus_3 , over(diagnosed)

It isn't saved because my code restored the original data set. If there are additional things you want to do with it, do not restore the data set until you have done so.

If you want to save the variable permanently, that will be trickier. Basically, you are trying to have 11 different values stored in one variable, which, as you already saw, can't be done. My approach solved the problem by expanding each case 11 times and having each record for a case store one of the 11 values.

If you were pushing this further, you might reshape the data long. There would be 11 records for each case, one for each disease, i.e. 11 person_disease records. The first record would be for heart, the 2nd would be for asthma, the third for cancer, etc. This might be especially good if you had other disease-specific variables, e.g. do you have a family history of cancer, do you have a family history of asthma, etc. You might then use commands like clogit or xtlogit or melogit.

I can elaborate, but if all you want is to create a graph you don't really need to know all this! But if you want to get an idea about what I am talking about, see

https://www3.nd.edu/~rwilliam/Taiwan...xedEffects.pdf

That handout describes a situation where there is one record for each person for each year, i.e. it is panel data. But, in your case, there would be one record for each person for each disease. That is fine too.

Last edited by Richard Williams; 29 Jun 2021, 12:20. Reason: Edit: Original post said reshape wide when it should have said reshape long.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Geralt Ji

Join Date: May 2020

Posts: 27
#9

29 Jun 2021, 12:07

Originally posted by Richard Williams View Post

It isn't saved because my code restored the original data set. If there are additional things you want to do with it, do not restore the data set until you have done so.

If you want to save the variable permanently, that will be trickier. Basically, you are trying to have 11 different values stored in one variable, which, as you already saw, can't be done. My approach solved the problem by expanding each case 11 times and having each case store one of the 11 values.

If you were pushing this further, you might reshape the data wide. There would be 11 records for each case, one for each disease, i.e. 11 person_disease records. The first record would be for heart, the 2nd would be for asthma, the third for cancer, etc. This might be especially good if you had other disease-specific variables, e.g. do you have a family history of cancer, do you have a family history of asthma, etc. You might then use commands like clogit or xtlogit or melogit.

I can elaborate, but if all you want is to create a graph you don't really need to know all this! But if you want to get an idea about what I am talking about, see

https://www3.nd.edu/~rwilliam/Taiwan...xedEffects.pdf

That handout describes a situation where there is one record for each person for each year, i.e. it is panel data. But, in your case, there would be one record for each person for each disease. That is fine too.

Got it ! Thanks for your detailed response!
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#10

29 Jun 2021, 14:46

This FAQ by Nick Cox may be helpful. It covers same of the same things I did and a lot more.

https://www.stata.com/support/faqs/d...ple-responses/

At the end, it mentions Benn Jann's mrtab command (get the version from SSC). It is much less klutzy than what I showed before and produces the same results:

Code:

. webuse nhanes2f, clear . mrtab heartatk diabetes highbp, by(race) | Race | White Black Other | Total -----------------------------+------------------------------------+----------- heartatk Prior heart attack | 421 47 5 | 473 diabetes Diabetes status | 404 86 9 | 499 highbp High blood pressure | 3744 541 87 | 4372 -----------------------------+------------------------------------+----------- Total | 4569 674 101 | 5344 Cases | 4051 583 90 | 4724 Valid cases: 4724 Missing cases: 5613

Also it comes with a mrgraph command. Perhaps it will do what you want. If not you can stick with my klutzy approach. I tried

Code:

mrgraph bar heartatk diabetes highbp, sort by(race)

and it looked ok. In your case it might be

Code:

mrgraph bar phistatus_1 phistatus_2 phistatus_3, sort by(diagnosed)

Both mrtab and mrgraph have lots of options, and Nick's FAQ lists several other things you can try.

Again, if all you want is one table and one graph, there are probably relatively painless ways to get them. If you are thinking about more complicated analyses, reshaping long and using things like clogit and melogit may be the way to go.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#11

29 Jun 2021, 14:52

Incidentally, I don't know what your phistatus vars are (why are there 3 of them?) and how they are coded, so I don't know if this does what you want.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35803
#12

29 Jun 2021, 15:10

Thanks for the mention in #10.

See also https://www.statalist.org/forums/for...lable-from-ssc for a command that perhaps is of help, but I haven't read this thread carefully.
Comment
Geralt Ji

Join Date: May 2020

Posts: 27
#13

29 Jun 2021, 21:24

Thanks Richard and Nick! Really really helpful comments and these codes work perfectly!

But it seems that I still need to use the codes you showed before to generate diagnosed variable when I use the following codes:

Code:

mrgraph bar phistatus_1 phistatus_2 phistatus_3, sort by(diagnosed)

Last edited by Geralt Ji; 29 Jun 2021, 21:30.
Comment
Geralt Ji

Join Date: May 2020

Posts: 27
#14

29 Jun 2021, 21:44

Hi there,

I also have a question about if I want to group some diseases together, how to use mrtab to get the result? I want to add up the count of depression patients (dep == 1) and the count of other mental health illness (mental == 1). Then label them as "mental illness" .

When I use mrtab command to show the distribution of diseases, how to avoid double counting if someone has both depression and other mental health illness
Comment

Announcement