Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a new variable that shows the result of tabulation

    Hello Statalist,

    I consider myself a beginner, so I would appreciate your help, and thank you for this learning opportunity.

    out of 800 observations, I am interested to tabulate 3 variables:

    1- "biz" : which is patient id
    2- "status" : which is categorical with two level= surgery and not-surgery
    3- "quality" : which is categorical with two level= fail and pass

    I am interested to create a new variable (let's call it percent), which has the percentage of "pass" for each biz & status.

    So far I use these two commands separately, but they are not giving me what I am interested in:

    . by biz status, sort: tab quality

    . forvalues i=1/40 {
    asdoc tab2 status quality if biz==`i', append col nof save(asa)
    }

    They show the tabulation tables with percent fail&pass, but I don't know how to transform the percent pass as a new variable to my data set.
    Again, I appreciate your help.

    Best,
    Mahmoud
    Last edited by Mahmoud Amoli; 16 Nov 2020, 06:27.

  • #2
    See the thread https://www.statalist.org/forums/for...ncy-categories for a very similar question.

    Comment


    • #3
      Thanks Nick Cox!
      I tried to reproduce, but I do not know where did I missed. Could you please kindly take a look? here is my codes:
      where level 1 is status=surgery and level 2 is status= not-surgery

      egen new1 = mean(cond(level == 1, 100 * (1 - quality), .)), by(biz)


      egen new2 = mean(cond(level == 2, 100 * (1 - quality), .)), by(biz)

      Not only the it gives negative %, but I do not think the results are correct.
      a portion of data-editior after performing the codes:

      biz level new1 new2
      1 1 -93.33334 -100
      1 1 -93.33334 -100
      1 1 -93.33334 -100
      1 1 -93.33334 -100
      1 1 -93.33334 -100
      1 2 -93.33334 -100
      1 2 -93.33334 -100
      1 2 -93.33334 -100
      1 2 -93.33334 -100
      1 2 -93.33334 -100
      2 1 -84.21053 -100
      2 1 -84.21053 -100
      2 1 -84.21053 -100
      2 1 -84.21053 -100
      2 1 -84.21053 -100
      2 1 -84.21053 -100
      2 1 -84.21053 -100
      2 1 -84.21053 -100
      2 2 -84.21053 -100

      Comment


      • #4
        Your data example doesn't say anything about the variable quality. The code you used is based on the presumption that you are averaging an indicator variable with values 0 and 1. If that is not true, results are likely to be nonsensical. So, please show details about all the variables concerned.

        Comment


        • #5
          Agree completely with Nick #4. Please take a look at the FAQ section and read through the forum posting rules and use of -dataex- to produce data example. In absence of a proper data example, I can only assume your data probably look like this:

          Code:
              +------------------------+
               | biz   status   quality |
               |------------------------|
            1. |   1        0         0 |
            2. |   1        0         0 |
            3. |   1        1         0 |
            4. |   1        1         1 |
            5. |   1        1         1 |
            6. |   1        1         1 |
               |------------------------|
            7. |   2        0         1 |
            8. |   2        0         0 |
            9. |   2        1         0 |
           10. |   2        1         0 |
           11. |   2        1         1 |
           12. |   2        1         0 |
               +------------------------+
          If the structure above is true here below one example and probably not the best:

          Code:
          byso biz status: egen pass = sum(quality)
          byso biz status : gen percentpass = ((pass/_N))*100
          drop pass
          
               +-----------------------------------+
               | biz   status   quality   percen~s |
               |-----------------------------------|
            1. |   1        0         0          0 |
            2. |   1        0         0          0 |
            3. |   1        1         0         75 |
            4. |   1        1         1         75 |
            5. |   1        1         1         75 |
            6. |   1        1         1         75 |
               |-----------------------------------|
            7. |   2        0         1         50 |
            8. |   2        0         0         50 |
            9. |   2        1         0         25 |
           10. |   2        1         0         25 |
           11. |   2        1         1         25 |
           12. |   2        1         0         25 |
               +-----------------------------------+
          Roman

          Comment


          • #6
            Dear Nick Cox and Roman Mostazir,

            Thank you so much for your comments and supports. Sorry if my explanation was confusing, I am really novice, but eager to learn. Thanks for helping me. Here I try again to make sense.

            This is how my data look like, the "vha_status" and "qualitymeasure" were string, which I used encode and created status and quality.
            ID vha_status status qualitymeasure quality
            1 prostate prostate pass pass
            1 prostate prostate pass pass
            1 prostate_surgery prostate_surgery pass pass
            2 prostate prostate fail fail
            When I tried the tabulation, I get the result like this:

            . tabulation of status by qm_1 if biz==1


            Tabulation of status by quality when ID=1
            status QualityMeasure1
            Fail Pass Total
            Prostate 100.00 89.47 90.00
            Prostate_Surgery 0.00 10.53 10.00
            Total 100.00 100.00 100.00


            However, I am only interested in pass percentage. My goal is to have a column which shows the pass percentage for ID when status is prostate and when status is prostate_surgery. like this:
            ID_status pass_percent
            1_prostate 89.47
            1_prostate_surgery 10.53
            2_prostate 95
            2_prostate_surgery 100
            3_prostate 80
            So far, I was not able to create something meaningful, for example, when I tried the what Roman suggested, my result looks like this:
            ID percentpass
            1 193.33
            1 193.33
            1 193.33
            1 200
            Do you think I should use collapse? I truly appreciate your help.

            Mahmoud
            Last edited by Mahmoud Amoli; 17 Nov 2020, 05:15.

            Comment


            • #7
              We are trying to help. But as Roman Mostazir signalled you need to use dataex to give data examples. Too much guessing and engineering is needed otherwise to suggest code.

              Comment


              • #8
                Thanks Nick Cox! I did not get what Roman Mostazir said. Here it is:


                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float ID str23 vha_status long status str4 qualitymeasure1 long quality
                1 "Prostate" 1 "Pass" 2
                1 "Prostate" 1 "Pass" 2
                1 "Prostate" 1 "Pass" 2
                1 "Prostate" 1 "Pass" 2
                1 "Prostate" 1 "Pass" 2
                1 "Prostate" 1 "Pass" 2
                1 "Prostate" 1 "Pass" 2
                end
                label values status status
                label def status 1 "Prostate", modify
                label values quality qm_1
                label def qm_1 2 "Pass", modify
                Last edited by Mahmoud Amoli; 17 Nov 2020, 06:10.

                Comment


                • #9
                  .

                  Comment


                  • #10
                    Sorry, but that's not enough to show the possible values. All I see are Pass values.

                    Comment


                    • #11
                      Nick Cox Thanks for getting back to me so quickly, here it is:

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input float ID str23 vha_status long status str4 qualitymeasure1 long quality
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Fail" 1
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate"         1 "Pass" 2
                      1 "Prostate_Surgery" 2 "Pass" 2
                      1 "Prostate_Surgery" 2 "Pass" 2
                      1 "Prostate_Surgery" 2 "Pass" 2
                      1 "Prostate_Surgery" 2 "Pass" 2
                      1 "Prostate_Surgery" 2 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Fail" 1
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Fail" 1
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Fail" 1
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate"         1 "Pass" 2
                      2 "Prostate_Surgery" 2 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      3 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Fail" 1
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Fail" 1
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      4 "Prostate"         1 "Pass" 2
                      end
                      label values status status
                      label def status 1 "Prostate", modify
                      label def status 2 "Prostate_Surgery", modify
                      label values quality qm_1
                      label def qm_1 1 "Fail", modify
                      label def qm_1 2 "Pass", modify

                      Comment


                      • #12
                        Thanks for more detail. This may help:


                        Code:
                        . egen wanted = mean(100 * (quality == 2)), by(vha_status)
                        
                        . format wanted %2.1f  
                        
                        . tabdisp vha_status , c(wanted)
                        
                        -----------------------------
                              vha_status |     wanted
                        -----------------+-----------
                                Prostate |       91.3
                        Prostate_Surgery |      100.0
                        -----------------------------

                        Comment


                        • #13
                          I recoded the quality to 0 and 1 from 1 and 2. However, the 'percent' column will have several rows of repeated values of percentage as you have repeated ID. See below:

                          Code:
                          recode quality (1=0) (2 = 1)
                          lab define qual 0 "Fail" 1 "Pass", modify
                          lab val quality qual
                          
                          gen percent = .
                          
                          forval i = 1/4 { // replace 4 with the maximum for your ID range
                          
                              forval j = 1/2 { //This correspond to status value
                               su quality if ID == `i' & status == `j'
                                  replace percent = r(mean)*100 if ID == `i'& status == `j'
                             }
                          }
                          
                              ID   status   quality    percent  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Fail   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        1      Pass   93.33334  
                               1        2      Pass        100  
                               1        2      Pass        100  
                               1        2      Pass        100  
                               1        2      Pass        100  
                               1        2      Pass        100
                          Note: Seems like I posted at the smae time Nick posted. Nick's solution is neat (I learn everyday). However, the code needs ID in by option for percent within ID by status:

                          Code:
                           
                           egen wanted = mean(100 * (quality == 2)), by(ID vha_status)
                          Last edited by Roman Mostazir; 17 Nov 2020, 08:59.
                          Roman

                          Comment


                          • #14
                            I was harking back to #6 which had a table with ID_Status -- which isn't in this data example.

                            The principle is simple. The mean of an indicator is a probability. The mean of 100 x an indicator is a percent. Use by() to indicate groups of observations.

                            Indicators are 0 or 1 and given when true or false expressions are evaluated.

                            Comment


                            • #15
                              Thanks dear Nick Cox and Roman Mostazir ; I am still working on it, and I will be in touch on how it plays out. much appreciate your helps.
                              Mahmoud

                              Comment

                              Working...
                              X