Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to count each binary variables and only generate one variable to include all count information?

    Hi there,

    I have a set of binary variables and each of them stands for a certain health disease (e.g. heart = heart disease, dep = depression, asth = asthma...).
    I also have a variable phi with three values (1, 2, 3) which means three different kind of insurance.

    Currently, I want to look at the three insurance distribution, by diseases

    I wonder:

    1. How to get the number of patients for each disease and put the count information into a new variable count? Or is there an easier way to show how many patients there are for each disease? The below code I tried will overwrite the values of former disease if the respondent has two or more diseases. It cannot calculate the number of patient with a certain disease correctly.
    Code:
    gen diagnosed=.
    replace diagnosed = 1 if heart == 1
    replace diagnosed = 2 if dep  == 2
    replace diagnosed = 3 if asth == 3
    ....
    2. How to get the insurance distribution by diseases? e.g. for patients with each disease, how many people buy 1st insurance, how many people buy 2nd insurance?


    Thanks in advance!
    Last edited by Geralt Ji; 29 Jun 2021, 02:30.

  • #2
    Sorry I wrote wrong codes example. The below codes are the ones I tried. I tried to generate a new variable diagnosed and want to use tab command to show the number of patients for each disease. But it will encounter "overwrite" problem as I said in #1

    Code:
     gen diagnosed=.
    replace diagnosed = 1 if heart == 1
    replace diagnosed = 2 if dep  == 1
    replace diagnosed = 3 if asth == 1
    ....
    label define diagnosedl 1 "heart diseases" 2 "depression" 3"asthma"....
    label value diagnosed diagnosedl
    Last edited by Geralt Ji; 29 Jun 2021, 04:48.

    Comment


    • #3
      This is klutzy but I think it works. Perhaps there is a simpler way.

      Code:
      webuse nhanes2f, clear
      preserve
      gen idnum = _n
      expand 3
      bysort idnum: gen recnum = _n
      
      gen diagnosed=.
      replace diagnosed = 1 if heartatk == 1 & recnum == 1
      replace diagnosed = 2 if diabetes  == 1 & recnum == 2
      replace diagnosed = 3 if highbp == 1 & recnum == 3
      ....
      label define diagnosedl 1 "heart attack" 2 "diabetes" 3 "high blood pressure"
      label value diagnosed diagnosedl
      tab2 diagnosed race
      restore
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://academicweb.nd.edu/~rwilliam/

      Comment


      • #4
        Originally posted by Richard Williams View Post
        This is klutzy but I think it works. Perhaps there is a simpler way.

        Code:
        webuse nhanes2f, clear
        preserve
        gen idnum = _n
        expand 3
        bysort idnum: gen recnum = _n
        
        gen diagnosed=.
        replace diagnosed = 1 if heartatk == 1 & recnum == 1
        replace diagnosed = 2 if diabetes == 1 & recnum == 2
        replace diagnosed = 3 if highbp == 1 & recnum == 3
        ....
        label define diagnosedl 1 "heart attack" 2 "diabetes" 3 "high blood pressure"
        label value diagnosed diagnosedl
        tab2 diagnosed race
        restore

        Hi Richard,

        Thanks for your reply! I tried your code and got right number !

        Code:
        .....
        preserve
        gen idnum = _n
        expand 11 // there are a total of 11 kind of diseases
        bysort idnum: gen recnum = _n
        
        gen diagnosed = .
        replace diagnosed = 1 if heart == 1 & recnum == 1 // heart diseases
        replace diagnosed = 2 if asth == 1 & recnum == 2 // asthama
        replace diagnosed = 3 if cancer == 1 & recnum == 3 // cancer
        ......
        replace diagnosed = 10 if dep == 1 & recnum == 10 // Depression or anxiety
        replace diagnosed = 11 if mental== 1 & recnum == 11 // other mental illness
        tab2 diagnosed phi
        restore
        Last edited by Geralt Ji; 29 Jun 2021, 11:23.

        Comment


        • #5
          Does phi have missing cases that get dropped in the tab2 command?
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://academicweb.nd.edu/~rwilliam/

          Comment


          • #6
            Hi there,

            I found that diagnosed variable is not stored in the dataset. I'm also confused how to keep diagnosed variable? Because I want to use it as independent variables to plot a graph to show the insurance status, by diseases.

            Here is what I'm going to plot...
            Code:
            graph bar phistatus_1 phistatus_2 phistatus_3 , over(diagnosed)

            Comment


            • #7
              Originally posted by Richard Williams View Post
              Does phi have missing cases that get dropped in the tab2 command?
              Yes! Very sorry for inconvenience. It does have some missing values.

              Comment


              • #8
                Originally posted by Geralt Ji View Post
                Hi there,

                I found that diagnosed variable is not stored in the dataset. I'm also confused how to keep diagnosed variable? Because I want to use it as independent variables to plot a graph to show the insurance status, by diseases.

                Here is what I'm going to plot...
                Code:
                graph bar phistatus_1 phistatus_2 phistatus_3 , over(diagnosed)
                It isn't saved because my code restored the original data set. If there are additional things you want to do with it, do not restore the data set until you have done so.

                If you want to save the variable permanently, that will be trickier. Basically, you are trying to have 11 different values stored in one variable, which, as you already saw, can't be done. My approach solved the problem by expanding each case 11 times and having each record for a case store one of the 11 values.

                If you were pushing this further, you might reshape the data long. There would be 11 records for each case, one for each disease, i.e. 11 person_disease records. The first record would be for heart, the 2nd would be for asthma, the third for cancer, etc. This might be especially good if you had other disease-specific variables, e.g. do you have a family history of cancer, do you have a family history of asthma, etc. You might then use commands like clogit or xtlogit or melogit.

                I can elaborate, but if all you want is to create a graph you don't really need to know all this! But if you want to get an idea about what I am talking about, see

                https://www3.nd.edu/~rwilliam/Taiwan...xedEffects.pdf

                That handout describes a situation where there is one record for each person for each year, i.e. it is panel data. But, in your case, there would be one record for each person for each disease. That is fine too.
                Last edited by Richard Williams; 29 Jun 2021, 12:20. Reason: Edit: Original post said reshape wide when it should have said reshape long.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://academicweb.nd.edu/~rwilliam/

                Comment


                • #9
                  Originally posted by Richard Williams View Post

                  It isn't saved because my code restored the original data set. If there are additional things you want to do with it, do not restore the data set until you have done so.

                  If you want to save the variable permanently, that will be trickier. Basically, you are trying to have 11 different values stored in one variable, which, as you already saw, can't be done. My approach solved the problem by expanding each case 11 times and having each case store one of the 11 values.

                  If you were pushing this further, you might reshape the data wide. There would be 11 records for each case, one for each disease, i.e. 11 person_disease records. The first record would be for heart, the 2nd would be for asthma, the third for cancer, etc. This might be especially good if you had other disease-specific variables, e.g. do you have a family history of cancer, do you have a family history of asthma, etc. You might then use commands like clogit or xtlogit or melogit.

                  I can elaborate, but if all you want is to create a graph you don't really need to know all this! But if you want to get an idea about what I am talking about, see

                  https://www3.nd.edu/~rwilliam/Taiwan...xedEffects.pdf

                  That handout describes a situation where there is one record for each person for each year, i.e. it is panel data. But, in your case, there would be one record for each person for each disease. That is fine too.
                  Got it ! Thanks for your detailed response!

                  Comment


                  • #10
                    This FAQ by Nick Cox may be helpful. It covers same of the same things I did and a lot more.

                    https://www.stata.com/support/faqs/d...ple-responses/

                    At the end, it mentions Benn Jann's mrtab command (get the version from SSC). It is much less klutzy than what I showed before and produces the same results:

                    Code:
                    . webuse nhanes2f, clear
                    
                    . mrtab heartatk diabetes highbp, by(race)
                    
                                                 |                Race                
                                                 |      White       Black       Other |      Total
                    -----------------------------+------------------------------------+-----------
                    heartatk  Prior heart attack |        421          47           5 |        473 
                    diabetes     Diabetes status |        404          86           9 |        499 
                      highbp High blood pressure |       3744         541          87 |       4372 
                    -----------------------------+------------------------------------+-----------
                                           Total |       4569         674         101 |       5344 
                                           Cases |       4051         583          90 |       4724 
                    
                    Valid cases:       4724
                    Missing cases:     5613
                    Also it comes with a mrgraph command. Perhaps it will do what you want. If not you can stick with my klutzy approach. I tried

                    Code:
                    mrgraph bar heartatk diabetes highbp, sort by(race)
                    and it looked ok. In your case it might be

                    Code:
                    mrgraph bar phistatus_1 phistatus_2 phistatus_3, sort by(diagnosed)
                    Both mrtab and mrgraph have lots of options, and Nick's FAQ lists several other things you can try.

                    Again, if all you want is one table and one graph, there are probably relatively painless ways to get them. If you are thinking about more complicated analyses, reshaping long and using things like clogit and melogit may be the way to go.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://academicweb.nd.edu/~rwilliam/

                    Comment


                    • #11
                      Incidentally, I don't know what your phistatus vars are (why are there 3 of them?) and how they are coded, so I don't know if this does what you want.
                      -------------------------------------------
                      Richard Williams, Notre Dame Dept of Sociology
                      StataNow Version: 19.5 MP (2 processor)

                      EMAIL: [email protected]
                      WWW: https://academicweb.nd.edu/~rwilliam/

                      Comment


                      • #12
                        Thanks for the mention in #10.

                        See also https://www.statalist.org/forums/for...lable-from-ssc for a command that perhaps is of help, but I haven't read this thread carefully.

                        Comment


                        • #13
                          Thanks Richard and Nick! Really really helpful comments and these codes work perfectly!

                          But it seems that I still need to use the codes you showed before to generate diagnosed variable when I use the following codes:
                          Code:
                           mrgraph bar phistatus_1 phistatus_2 phistatus_3, sort by(diagnosed)
                          Last edited by Geralt Ji; 29 Jun 2021, 21:30.

                          Comment


                          • #14
                            Hi there,

                            I also have a question about if I want to group some diseases together, how to use mrtab to get the result? I want to add up the count of depression patients (dep == 1) and the count of other mental health illness (mental == 1). Then label them as "mental illness" .

                            When I use mrtab command to show the distribution of diseases, how to avoid double counting if someone has both depression and other mental health illness

                            Comment

                            Working...
                            X