How to loop over columns of 70 to get the categorical counts based on a particular observation?

Sitaram Sahoo

Join Date: Aug 2019

Posts: 15
#1

How to loop over columns of 70 to get the categorical counts based on a particular observation?

01 Sep 2019, 09:57

Hi all,

I've a hospital cases dataset having 30 columns and 70 observations. Based on the types of cases they receive everyday they have a tracker to capture the values in "Y",not available cases are in "NA", also there are blank cases like ".". Here is a sample of the dataset :

Day braintumor Fracture heartbypass Accident Stroke

1 Y NA Y Y Y

2 NA Y Y Y .

3 Y . NA . Y

4 . . . Y NA

5 Y Y . Y NA

I would require help in producing a table like mentioned below with all the categories counts(by considering only "Y" cases) for individual categories divided by the no of days to get the percentages and that has to be in a specified format(xx.xx%) as shown below. Eg: if i will calculate for 5 days as shown in my example the output should look like:

totalbraintumor 3/5 (60.00%)

total Fracture 2/5(40.00%)

total Accident 2/5(40.00%)

total Stroke 4/5(80.00%)

I'm very new to Stata almost like 1.5 months back i started to learn it and behind making this output i had learned how to create the denominator I'm struck at that point. So, any guidance would be helpful in how to approach the problem from the point where i'm stopped at.

With regards,
Sitaram
Tags: categorical, data
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

01 Sep 2019, 10:15

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte day str2(braintumor fracture heartbypass) str1 accident str2 stroke 1 "Y" "NA" "Y" "Y" "Y" 2 "NA" "Y" "Y" "Y" "." 3 "Y" "." "NA" "." "Y" 4 "." "." "." "Y" "NA" 5 "Y" "Y" "." "Y" "NA" end foreach v of varlist braintumor-stroke { egen `v'total = total(`v' == "Y") } count local denominator `r(N)' keep *total keep in 1 gen obs_no = _n reshape long @total, i(obs_no) j(condition) string drop obs_no gen percent = 100*total/`denominator'

In the future, when showing data examples, please use the -dataex- command to do so, as I have in this response. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Sitaram Sahoo

Join Date: Aug 2019

Posts: 15
#3

01 Sep 2019, 10:54

Thank you Prof Clyde Schechter for your suggestion .I'm trying my best to upgrade my self by seeking help from stata experts in this forum. As you suggest, i will do it first by using dataex which i have just checked that it is not installed with the stata version i'm using. I have gone through your code . Few doubts on which i would like to have your attention/ replay on: kindly help.
1. local denominator `r(N)' : the r() which is used to store the result as i read from online materials, how's it working when we don't have a N in the entire code stored? 2. reshape long @total, i(obs_no) j(condition) string : How is j(condition) is storing all the categories when it isn't mentioned anywhere to store the strings and what is @ ? how i , j in reshape can be inferred does i &j represents row,col or col,row? how it's working in this particular case? Apologies, if my question sounds like too general. Since i'm very new and trying out various options to learn the maximum as possible ,requesting you to help me if there is a faster way to bridge the gaps via any course work/ link. Expecting to have a reply from you soon. With regards, Sitaram
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

01 Sep 2019, 12:02

1. local denominator `r(N)' : the r() which is used to store the result as i read from online materials, how's it working when we don't have a N in the entire code stored?

r(N) is created by the immediately preceding -count- command. See -help count-.

2. reshape long @total, i(obs_no) j(condition) string : How is j(condition) is storing all the categories when it isn't mentioned anywhere to store the strings...

The -reshape- command is complicated and is one of the more difficult commands for people to learn to use. Reading the manual section about it (run -help reshape- and click on the blue link near the top of the page for the manual section where the command is explained in detail) will get you started, but most users don't quite "get it" until they have used it often enough that it suddenly "clicks" in their brains. Suffice it to say that storing those values in the new variable condition is precisely what the -j()- option of -reshape- does.

...and what is @ ?

Normally in -reshape- commands, the material to be stored in the variable created by the -j()- option is found at the end of the variable name. But you can override that behavior of -reshape- by using an @ character to tell -reshape- where the material to be stored in the new variable actually is found. I could have avoided this complication by using a different variable name in the -egen- command. This code would work the same as what is shown in #2 but avoids needing to use the @:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte day str2(braintumor fracture heartbypass) str1 accident str2 stroke 1 "Y" "NA" "Y" "Y" "Y" 2 "NA" "Y" "Y" "Y" "." 3 "Y" "." "NA" "." "Y" 4 "." "." "." "Y" "NA" 5 "Y" "Y" "." "Y" "NA" end foreach v of varlist braintumor-stroke { egen total`v' = total(`v' == "Y") } count local denominator `r(N)' keep total* keep in 1 gen obs_no = _n reshape long total, i(obs_no) j(condition) string drop obs_no gen percent = 100*total/`denominator'

(Changes from original are in bold face.)

how i , j in reshape can be inferred does i &j represents row,col or col,row? how it's working in this particular case?

As I said in response to an earlier question, the -reshape- command is complicated and a clear explanation of the -i()- and -j()- options of -reshape- would be very lengthy. It is all set out in the manual chapter about the -reshape- command, which I commend to you. But to really grasp it intuitively, you actually need to use the -reshape- command several times before it "sinks in." Also near the bottom of the -help reshape- page there are links to videos that you may find helpful.
Comment

Day	braintumor	Fracture	heartbypass	Accident	Stroke
1	Y	NA	Y	Y	Y
2	NA	Y	Y	Y	.
3	Y	.	NA	.	Y
4	.	.	.	Y	NA
5	Y	Y	.	Y	NA

totalbraintumor	3/5 (60.00%)
total Fracture	2/5(40.00%)
total Accident	2/5(40.00%)
total Stroke	4/5(80.00%)

Announcement

How to loop over columns of 70 to get the categorical counts based on a particular observation?

Comment

Comment

Comment