frequency table & trend

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#1

frequency table & trend

18 Feb 2018, 23:46

Dear all,

I hope everything goes well with you.

I am again asking for your help. This time I ma having problem creating table.

Data looks as below. (but as suggested I will reshape them before analysis)

input byte(id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex) int str12 first_date last_date
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 2010
2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2006 2010
3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 2007 2009
4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 2006 2010
5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 2 2007 2009
6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 2006 2010
7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2006 2009
8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2007 2010
9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 2006 2010

end

I would like to create the table first then figure showing the trend of prescription.
Table 1. Prescribing rate by gender

I would like to create the following table:
Year Male (1) Female (2)

Aspirin Clopidogrel Statin Fibrate Aspirin Clopidogrel Statin Fibrate

2006

2007

2008

2009

2010

I would really appreciate if you may find time to help me solve the problem.

Thank you in advance.
Sincerely,
Oyun
Tags: None
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#2

19 Feb 2018, 00:00

This is the figure I would like to create.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

19 Feb 2018, 09:09

Code:

clear
input byte(id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex) int str12 first_date last_date
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 2010
2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2006 2010
3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 2007 2009
4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 2006 2010
5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 2 2007 2009
6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 2006 2010
7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2006 2009
8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2007 2010
9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 2006 2010

end

destring first_date last_date, replace
reshape long aspirin clopidogrel statin fibrate, i(id) j(year)
keep if inrange(year, first_date, last_date)

//    CALCULATE RATES BY SEX BY YEAR FOR EACH DRUG
collapse (sum) aspirin-fibrate (count) denom = id, by(sex year)
foreach v of varlist aspirin-fibrate{
    gen `v'_rate = `v'/denom
}

//    GRAPH THE RESULTS
label define sex    1    "Male"    2    "Female"
label values sex sex
graph twoway connect *_rate year, by(sex) sort

//    TABULAR FORM OF RESULTS
keep year sex *_rate
reshape long @_rate, i(year sex) j(drug_class) string
table year drug_class sex, c(mean _rate)

-graph twoway- has many options that will allow you to customize the appearance of your graph. Do read about them in the PDF manuals that come with your installation and apply them as desired.

Again, the long layout works best, and, in the end for this table, a fully long layout where even the name of the drug is a separate variable, is best of all. (Notice, though that for graphing purposes, the partially-wide layout with each drug being a separate variable is needed. Graphing multiple variables on the same axes requires that those variables be separate, hence the semi-wide layout.)

Comment

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#4

19 Feb 2018, 18:06

Dear prof Schechter,

Thank you for taking time to help me.
I've tried the code for real dataset but I get the following errors:

1.
collapse (sum) aspirin-fibrate (count) denom = id, by(sex supply_year)

type mismatch (probably because ID contains nonnumeric characters)
r (109)
foreach v of varlist aspirin-fibrate{
gen `v'_rate = `v'/denom
}

denom not found
r (111)

I would really appreciate if you may help me to solve the problem.

Thank you
Sincerely,
Oyun

Last edited by Buyadaa Oyunchimeg; 19 Feb 2018, 18:43.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

19 Feb 2018, 18:45

Well, the second error, denom not found, is because denom is created by the -collapse- command that didn't run! Once you hit an error in Stata, it is not a good idea to continue to the next command. You should fix the problem you have before proceeding.

As for the -collapse- error, this must have arisen because one or more of id, aspirin, clopidogrel, statin, or fibrate is a string variable. Run -describe- to see which it is, and then -destring- it (or them).
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#6

19 Feb 2018, 21:50

I've removed nonnumeric characters from ID and tried it again. It works perfectly!

Thank you so much.
Sincerely,
Oyun
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#7

19 Feb 2018, 22:25

I have one more question.

When I applied the following command in real dataset, I found that each year has same denominator. But what I was expecting that each year will have different numbers of people, so denominators also will be different each year. How if I define prescription rate as the number of each lipid lowering drug prescription (statin, fibrate) divided by the total number of lipid lowering prescriptions?

collapse (sum) aspirin-fibrate (count) denom = id, by(sex year)
foreach v of varlist aspirin-fibrate{
gen `v'_rate = `v'/denom
}

Thank you again,

Last edited by Buyadaa Oyunchimeg; 19 Feb 2018, 23:09.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

20 Feb 2018, 08:38

Well, this is different from what you originally asked for. A prescription rate is, by definition, the number of prescriptions issued divided by the number of people in the population to which they are prescribed. What you are looking for is the proportional prescription rate within drug class. The calculations for that would be:

Code:

// CALCULATE PROPORTIONS OF RX WITIN CLASS BY SEX BY YEAR FOR EACH DRUG collapse (sum) aspirin-fibrate, by(sex year) gen aspirin_prop = aspirin/(aspirin+clopidogrel) gen clopidogrel_prop = 1 - aspirin_prop gen statin_prop = statin/(statin+fibrate) gen fibrate_prop = 1-statin_prop

And in the code for the graphs and tables, just replace rate by prop wherever it occurs.

Note: I have simplified the code by relying on the (presumably coincidental) fact that in your data you have exactly two anticoagulants and exactly two lipid-lowering drugs. This means that the proportional prescription rate for the second drug is necessarily one minus the proportional prescription rate for the first. If your real data has more drugs in classes, then you will need to modify the code to calculate each proportional prescription rate as the number of prescriptions for the drug divided by the sum of the number of prescriptions for each drug in the class. Also, if the number of drugs in a class is large, you might want to first calculate the total by using -egen, rowtotal()- instead of writing out the sum over and over again.

Finally, in the future, if your ID variable is a string, don't post a data example in which the id variable is numeric. That's just asking for trouble. The whole point of using -dataex- is that it gives those who want to help you a data set to work with that is exactly like your real data, so that code that works in the example will work in your real data. Of course there are some aspects of the real data that cannot be shown in example; if nothing else the sample size in the example will typically be be much smaller than the real data. But data storage types are really important: Stata treats string and numeric variables quite differently in many commands. Even the difference between float and double in numeric variable can require different code to work properly. Don't change data storage types or labeling when posting examples: they often matter!
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#9

20 Feb 2018, 18:19

Thank so much prof. Schechter.

But I still would like to know a prescription rate for each drug.

I've applied the following command:

collapse (sum) aspirin-fibrate (count) denom = id, by(sex year)

foreach v of varlist aspirin-fibrate{

gen `v'_rate = `v'/denom

}

I got the following results.

Here denominators differ year by year. But in real dataset denominators are same in all year (I thought it would be different because number of patients varies each year ). Is there any possibility to solve this problem?

Thank you so much
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

20 Feb 2018, 18:30

How do you expect anyone to figure out why you are getting unexpected results in the real data set when you haven't shown the real data set? I'm happy to troubleshoot the code--but you have to make it possible. Post a new example, using -dataex- that produces incorrect results and I'll try to figure out what the problem is. But all I can say at this point is that the code you have produces correct results with the data you have shown, and it appears to be correct. So perhaps there is something about the real data that I don't understand. But until I see it here, there is nothing I can do to help you.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#11

20 Feb 2018, 21:39

There is some problem replying to my old post so I am posting a new one. This is continuation of my old post. Please refer to link below.

https://www.statalist.org/forums/for...cy-table-trend

I would like to find prescription rate by year and by sex. As professor Schechter suggested I've used the loops, but there is problem with denominator.

input byte(id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex) int str12 first_date last_date
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 2010
2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 1 2006 2010
3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 2 2007 2009
4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 1 2 2006 2010
5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 0 1 2007 2009
6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 2006 2010
7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 2 2006 2009
8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 2 2007 2010
9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 2006 2010
10 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 2008 2010
11 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 2006 2009
12 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 2 2006 2010
13 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 2007 2009
14 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 2006 2009
end

destring first_date last_date, replace
reshape long aspirin clopidogrel statin fibrate, i(id) j(year)
keep if inrange(year, first_date, last_date)

// CALCULATE RATES BY SEX BY YEAR FOR EACH DRUG
collapse (sum) aspirin-fibrate (count) denom = id, by(sex year)
foreach v of varlist aspirin-fibrate{
gen `v'_rate = `v'/denom

As shown above each year has same denominator (8 for male and 6 for female). But I thought that each year will have different denominator because the number of patients differ from year to year.

I would really appreciate if you may find time to help me solve the problem.

Thank you.
Sincerely,
Oyun
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

#12

20 Feb 2018, 23:02

Oyun, I don't know what you are doing wrong, but I copied the code directly from #1 (including the data input) and ran it with no modification whatsoever, except for adding the closing brace (}) on a separate line after the last line of code to close the -foreach- loop. Here is what the results look like:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int year byte sex double(aspirin clopidogrel statin fibrate) long denom float(aspirin_rate clopidogrel_rate statin_rate fibrate_rate)
2006 1 1 1 2 1 4       .25      .25        .5       .25
2007 1 3 2 3 4 7  .4285714 .2857143  .4285714  .5714286
2008 1 0 0 1 3 8         0        0      .125      .375
2009 1 2 4 2 3 8       .25       .5       .25      .375
2010 1 3 1 2 0 4       .75      .25        .5         0
2006 2 0 0 2 1 4         0        0        .5       .25
2007 2 2 2 1 2 6  .3333333 .3333333 .16666667  .3333333
2008 2 3 2 1 4 6        .5 .3333333 .16666667  .6666667
2009 2 1 3 3 1 6 .16666667       .5        .5 .16666667
2010 2 2 1 1 2 4        .5      .25       .25        .5
end

Notice that the denominators are different for different years (and sexes). These results do not match what you show in your screenshot.

I don't know what you are doing, but you are not running this code with this data and getting those results.

Comment

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#13

21 Feb 2018, 17:34

Dear professor Schechter, I think the reason I got the different result is because of "
keep if inrange(year, first_date, last_date" command. (It seems I did not run it before loops).
As I understood we should run the above (keep if inrange(year, first_date, last_date))command because first need to get rid of additional observations that were created in wide format, am I right or is there any other reason?

I am really sorry for inconvenience. In future I will try to be more attentive and accurate.

Thank you so much for all your help and generosity.

Sincerely,
Oyun
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#14

21 Feb 2018, 19:03

As I understood we should run the above (keep if inrange(year, first_date, last_date))command because first need to get rid of additional observations that were created in wide format, am I right or is there any other reason?

You are absolutely right.

I am really sorry for inconvenience.

No problem. No apologies needed.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#15

09 Mar 2018, 02:04

Dear all,

I hope everything going well with you.
This time I am having trouble in presenting the incidence rate of disease (ex: Myocardial infarction) and medication trends in one graph.
Is it possible?

I would really appreciate if you may help me to solve the problem.

Thank you in advance.

Data looks as below.

input byte(id aspirin clopidogrel heparin lipid_lowering fibrate) int age str6 supply_date date_at_diagnosis date_of_death
1 1 0 1 0 0 65 2002 2005 2006
1 0 0 0 1 1 55 2003 2005 0
1 0 0 1 0 1 55 2003 0 0
1 0 0 0 0 1 75 2002 0 0
2 1 1 0 0 1 65 2003 2008 2008
2 0 1 0 0 0 66 2007 0 0
2 0 1 0 1 0 76 2006 0 0
3 0 0 0 1 1 76 2009 0 2010
3 1 0 0 0 0 46 2010 0 0
3 1 1 0 0 0 56 2011 2009 0
3 1 1 1 1 0 46 2008 0 0
3 0 0 0 1 0 56 2008 2009 0
4 0 0 1 0 1 55 2011 0 2012
4 0 0 0 0 1 75 2011 0 0
5 1 1 0 0 0 65 2003 2008 2008
5 0 1 0 0 1 66 2007 0 0
5 0 1 0 1 0 76 2006 2006 0
end
Comment

Year	Male (1)				Female (2)
Aspirin	Clopidogrel	Statin	Fibrate	Aspirin	Clopidogrel	Statin	Fibrate
2006
2007
2008
2009
2010

Announcement