Create a new variable that shows the result of tabulation

Mahmoud Amoli

Join Date: Nov 2020

Posts: 23
#1

Create a new variable that shows the result of tabulation

16 Nov 2020, 05:50

Hello Statalist,

I consider myself a beginner, so I would appreciate your help, and thank you for this learning opportunity.

out of 800 observations, I am interested to tabulate 3 variables:

1- "biz" : which is patient id
2- "status" : which is categorical with two level= surgery and not-surgery
3- "quality" : which is categorical with two level= fail and pass

I am interested to create a new variable (let's call it percent), which has the percentage of "pass" for each biz & status.

So far I use these two commands separately, but they are not giving me what I am interested in:

. by biz status, sort: tab quality

. forvalues i=1/40 {
asdoc tab2 status quality if biz==`i', append col nof save(asa)
}

They show the tabulation tables with percent fail&pass, but I don't know how to transform the percent pass as a new variable to my data set.
Again, I appreciate your help.

Best,
Mahmoud

Last edited by Mahmoud Amoli; 16 Nov 2020, 06:27.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35721
#2

16 Nov 2020, 06:44

See the thread https://www.statalist.org/forums/for...ncy-categories for a very similar question.
1 like
Comment
Mahmoud Amoli

Join Date: Nov 2020

Posts: 23
#3

16 Nov 2020, 08:29

Thanks Nick Cox!
I tried to reproduce, but I do not know where did I missed. Could you please kindly take a look? here is my codes:
where level 1 is status=surgery and level 2 is status= not-surgery

egen new1 = mean(cond(level == 1, 100 * (1 - quality), .)), by(biz)

egen new2 = mean(cond(level == 2, 100 * (1 - quality), .)), by(biz)

Not only the it gives negative %, but I do not think the results are correct.
a portion of data-editior after performing the codes:

biz level new1 new2
1 1 -93.33334 -100
1 1 -93.33334 -100
1 1 -93.33334 -100
1 1 -93.33334 -100
1 1 -93.33334 -100
1 2 -93.33334 -100
1 2 -93.33334 -100
1 2 -93.33334 -100
1 2 -93.33334 -100
1 2 -93.33334 -100
2 1 -84.21053 -100
2 1 -84.21053 -100
2 1 -84.21053 -100
2 1 -84.21053 -100
2 1 -84.21053 -100
2 1 -84.21053 -100
2 1 -84.21053 -100
2 1 -84.21053 -100
2 2 -84.21053 -100
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#4

16 Nov 2020, 10:39

Your data example doesn't say anything about the variable quality. The code you used is based on the presumption that you are averaging an indicator variable with values 0 and 1. If that is not true, results are likely to be nonsensical. So, please show details about all the variables concerned.
Comment

Roman Mostazir

Join Date: Apr 2014
Posts: 874

16 Nov 2020, 12:33

Agree completely with Nick #4. Please take a look at the FAQ section and read through the forum posting rules and use of -dataex- to produce data example. In absence of a proper data example, I can only assume your data probably look like this:

Code:

    +------------------------+
     | biz   status   quality |
     |------------------------|
  1. |   1        0         0 |
  2. |   1        0         0 |
  3. |   1        1         0 |
  4. |   1        1         1 |
  5. |   1        1         1 |
  6. |   1        1         1 |
     |------------------------|
  7. |   2        0         1 |
  8. |   2        0         0 |
  9. |   2        1         0 |
 10. |   2        1         0 |
 11. |   2        1         1 |
 12. |   2        1         0 |
     +------------------------+

If the structure above is true here below one example and probably not the best:

Code:

byso biz status: egen pass = sum(quality)
byso biz status : gen percentpass = ((pass/_N))*100
drop pass

     +-----------------------------------+
     | biz   status   quality   percen~s |
     |-----------------------------------|
  1. |   1        0         0          0 |
  2. |   1        0         0          0 |
  3. |   1        1         0         75 |
  4. |   1        1         1         75 |
  5. |   1        1         1         75 |
  6. |   1        1         1         75 |
     |-----------------------------------|
  7. |   2        0         1         50 |
  8. |   2        0         0         50 |
  9. |   2        1         0         25 |
 10. |   2        1         0         25 |
 11. |   2        1         1         25 |
 12. |   2        1         0         25 |
     +-----------------------------------+

Roman

Comment

Mahmoud Amoli

Join Date: Nov 2020

Posts: 23
#6

17 Nov 2020, 05:09

Dear Nick Cox and Roman Mostazir,

Thank you so much for your comments and supports. Sorry if my explanation was confusing, I am really novice, but eager to learn. Thanks for helping me. Here I try again to make sense.

This is how my data look like, the "vha_status" and "qualitymeasure" were string, which I used encode and created status and quality.
ID vha_status status qualitymeasure quality

1 prostate prostate pass pass

1 prostate prostate pass pass

1 prostate_surgery prostate_surgery pass pass

2 prostate prostate fail fail

When I tried the tabulation, I get the result like this:

. tabulation of status by qm_1 if biz==1

Tabulation of status by quality when ID=1
status QualityMeasure1
Fail Pass Total
Prostate 100.00 89.47 90.00
Prostate_Surgery 0.00 10.53 10.00
Total 100.00 100.00 100.00

However, I am only interested in pass percentage. My goal is to have a column which shows the pass percentage for ID when status is prostate and when status is prostate_surgery. like this:
ID_status pass_percent

1_prostate 89.47

1_prostate_surgery 10.53

2_prostate 95

2_prostate_surgery 100

3_prostate 80

So far, I was not able to create something meaningful, for example, when I tried the what Roman suggested, my result looks like this:

ID percentpass

1 193.33

1 193.33

1 193.33

1 200

Do you think I should use collapse? I truly appreciate your help.

Mahmoud

Last edited by Mahmoud Amoli; 17 Nov 2020, 05:15.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#7

17 Nov 2020, 05:42

We are trying to help. But as Roman Mostazir signalled you need to use dataex to give data examples. Too much guessing and engineering is needed otherwise to suggest code.
1 like
Comment

Mahmoud Amoli

Join Date: Nov 2020
Posts: 23

17 Nov 2020, 06:03

Thanks Nick Cox! I did not get what Roman Mostazir said. Here it is:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float ID str23 vha_status long status str4 qualitymeasure1 long quality
1 "Prostate" 1 "Pass" 2
1 "Prostate" 1 "Pass" 2
1 "Prostate" 1 "Pass" 2
1 "Prostate" 1 "Pass" 2
1 "Prostate" 1 "Pass" 2
1 "Prostate" 1 "Pass" 2
1 "Prostate" 1 "Pass" 2
end
label values status status
label def status 1 "Prostate", modify
label values quality qm_1
label def qm_1 2 "Pass", modify

Last edited by Mahmoud Amoli; 17 Nov 2020, 06:10.

Comment

Mahmoud Amoli

Join Date: Nov 2020

Posts: 23
#9

17 Nov 2020, 06:07

.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#10

17 Nov 2020, 06:14

Sorry, but that's not enough to show the possible values. All I see are Pass values.
Comment

Mahmoud Amoli

Join Date: Nov 2020
Posts: 23

#11

17 Nov 2020, 06:29

Nick Cox Thanks for getting back to me so quickly, here it is:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float ID str23 vha_status long status str4 qualitymeasure1 long quality
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Fail" 1
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate"         1 "Pass" 2
1 "Prostate_Surgery" 2 "Pass" 2
1 "Prostate_Surgery" 2 "Pass" 2
1 "Prostate_Surgery" 2 "Pass" 2
1 "Prostate_Surgery" 2 "Pass" 2
1 "Prostate_Surgery" 2 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Fail" 1
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Fail" 1
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Fail" 1
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate"         1 "Pass" 2
2 "Prostate_Surgery" 2 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
3 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Fail" 1
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Fail" 1
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
4 "Prostate"         1 "Pass" 2
end
label values status status
label def status 1 "Prostate", modify
label def status 2 "Prostate_Surgery", modify
label values quality qm_1
label def qm_1 1 "Fail", modify
label def qm_1 2 "Pass", modify

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35721

#12

17 Nov 2020, 08:34

Thanks for more detail. This may help:

Code:

. egen wanted = mean(100 * (quality == 2)), by(vha_status)

. format wanted %2.1f  

. tabdisp vha_status , c(wanted)

-----------------------------
      vha_status |     wanted
-----------------+-----------
        Prostate |       91.3
Prostate_Surgery |      100.0
-----------------------------

Comment

Roman Mostazir

Join Date: Apr 2014
Posts: 874

#13

17 Nov 2020, 08:38

I recoded the quality to 0 and 1 from 1 and 2. However, the 'percent' column will have several rows of repeated values of percentage as you have repeated ID. See below:

Code:

recode quality (1=0) (2 = 1)
lab define qual 0 "Fail" 1 "Pass", modify
lab val quality qual

gen percent = .

forval i = 1/4 { // replace 4 with the maximum for your ID range

    forval j = 1/2 { //This correspond to status value
     su quality if ID == `i' & status == `j'
        replace percent = r(mean)*100 if ID == `i'& status == `j'
   }
}

    ID   status   quality    percent  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Fail   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        1      Pass   93.33334  
     1        2      Pass        100  
     1        2      Pass        100  
     1        2      Pass        100  
     1        2      Pass        100  
     1        2      Pass        100

Note: Seems like I posted at the smae time Nick posted. Nick's solution is neat (I learn everyday). However, the code needs ID in by option for percent within ID by status:

Code:

 
 egen wanted = mean(100 * (quality == 2)), by(ID vha_status)

Last edited by Roman Mostazir; 17 Nov 2020, 08:59.

Roman

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35721
#14

17 Nov 2020, 09:58

I was harking back to #6 which had a table with ID_Status -- which isn't in this data example.

The principle is simple. The mean of an indicator is a probability. The mean of 100 x an indicator is a percent. Use by() to indicate groups of observations.

Indicators are 0 or 1 and given when true or false expressions are evaluated.
1 like
Comment
Mahmoud Amoli

Join Date: Nov 2020

Posts: 23
#15

18 Nov 2020, 06:17

Thanks dear Nick Cox and Roman Mostazir ; I am still working on it, and I will be in touch on how it plays out. much appreciate your helps.
Mahmoud
Comment

ID	vha_status	status	qualitymeasure	quality
1	prostate	prostate	pass	pass
1	prostate	prostate	pass	pass
1	prostate_surgery	prostate_surgery	pass	pass
2	prostate	prostate	fail	fail

ID_status	pass_percent
1_prostate	89.47
1_prostate_surgery	10.53
2_prostate	95
2_prostate_surgery	100
3_prostate	80

ID	percentpass
1	193.33
1	193.33
1	193.33
1	200

Announcement