Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • frequency table & trend

    Dear all,

    I hope everything goes well with you.

    I am again asking for your help. This time I ma having problem creating table.

    Data looks as below. (but as suggested I will reshape them before analysis)

    input byte(id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex) int str12 first_date last_date
    1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 2010
    2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2006 2010
    3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 2007 2009
    4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 2006 2010
    5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 2 2007 2009
    6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 2006 2010
    7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2006 2009
    8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2007 2010
    9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 2006 2010

    end


    I would like to create the table first then figure showing the trend of prescription.
    Table 1. Prescribing rate by gender

    I would like to create the following table:
    Year Male (1) Female (2)
    Aspirin Clopidogrel Statin Fibrate Aspirin Clopidogrel Statin Fibrate
    2006
    2007
    2008
    2009
    2010












    I would really appreciate if you may find time to help me solve the problem.

    Thank you in advance.
    Sincerely,
    Oyun






  • #2
    Click image for larger version

Name:	Prescribing rate by sex.png
Views:	1
Size:	109.6 KB
ID:	1430511

    This is the figure I would like to create.

    Comment


    • #3
      Code:
      clear
      input byte(id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex) int str12 first_date last_date
      1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 2010
      2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2006 2010
      3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 2007 2009
      4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 2006 2010
      5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 2 2007 2009
      6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 2006 2010
      7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2006 2009
      8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2007 2010
      9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 2006 2010
      
      end
      
      destring first_date last_date, replace
      reshape long aspirin clopidogrel statin fibrate, i(id) j(year)
      keep if inrange(year, first_date, last_date)
      
      //    CALCULATE RATES BY SEX BY YEAR FOR EACH DRUG
      collapse (sum) aspirin-fibrate (count) denom = id, by(sex year)
      foreach v of varlist aspirin-fibrate{
          gen `v'_rate = `v'/denom
      }
      
      //    GRAPH THE RESULTS
      label define sex    1    "Male"    2    "Female"
      label values sex sex
      graph twoway connect *_rate year, by(sex) sort
      
      //    TABULAR FORM OF RESULTS
      keep year sex *_rate
      reshape long @_rate, i(year sex) j(drug_class) string
      table year drug_class sex, c(mean _rate)
      -graph twoway- has many options that will allow you to customize the appearance of your graph. Do read about them in the PDF manuals that come with your installation and apply them as desired.

      Again, the long layout works best, and, in the end for this table, a fully long layout where even the name of the drug is a separate variable, is best of all. (Notice, though that for graphing purposes, the partially-wide layout with each drug being a separate variable is needed. Graphing multiple variables on the same axes requires that those variables be separate, hence the semi-wide layout.)

      Comment


      • #4
        Dear prof Schechter,

        Thank you for taking time to help me.
        I've tried the code for real dataset but I get the following errors:

        1.
        collapse (sum) aspirin-fibrate (count) denom = id, by(sex supply_year)


        type mismatch (probably because ID contains nonnumeric characters)
        r (109)
        foreach v of varlist aspirin-fibrate{
        gen `v'_rate = `v'/denom
        }


        denom not found
        r (111)

        I would really appreciate if you may help me to solve the problem.

        Thank you
        Sincerely,
        Oyun
        Last edited by Buyadaa Oyunchimeg; 19 Feb 2018, 18:43.

        Comment


        • #5
          Well, the second error, denom not found, is because denom is created by the -collapse- command that didn't run! Once you hit an error in Stata, it is not a good idea to continue to the next command. You should fix the problem you have before proceeding.

          As for the -collapse- error, this must have arisen because one or more of id, aspirin, clopidogrel, statin, or fibrate is a string variable. Run -describe- to see which it is, and then -destring- it (or them).

          Comment


          • #6
            I've removed nonnumeric characters from ID and tried it again. It works perfectly!

            Thank you so much.
            Sincerely,
            Oyun

            Comment


            • #7
              I have one more question.

              When I applied the following command in real dataset, I found that each year has same denominator. But what I was expecting that each year will have different numbers of people, so denominators also will be different each year. How if I define prescription rate as the number of each lipid lowering drug prescription (statin, fibrate) divided by the total number of lipid lowering prescriptions?
              collapse (sum) aspirin-fibrate (count) denom = id, by(sex year)
              foreach v of varlist aspirin-fibrate{
              gen `v'_rate = `v'/denom
              }



              Thank you again,
              Last edited by Buyadaa Oyunchimeg; 19 Feb 2018, 23:09.

              Comment


              • #8
                Well, this is different from what you originally asked for. A prescription rate is, by definition, the number of prescriptions issued divided by the number of people in the population to which they are prescribed. What you are looking for is the proportional prescription rate within drug class. The calculations for that would be:

                Code:
                //    CALCULATE PROPORTIONS OF RX WITIN CLASS BY SEX BY YEAR FOR EACH DRUG
                collapse (sum) aspirin-fibrate,  by(sex year)
                gen aspirin_prop = aspirin/(aspirin+clopidogrel)
                gen clopidogrel_prop = 1 - aspirin_prop
                gen statin_prop = statin/(statin+fibrate)
                gen fibrate_prop = 1-statin_prop
                And in the code for the graphs and tables, just replace rate by prop wherever it occurs.

                Note: I have simplified the code by relying on the (presumably coincidental) fact that in your data you have exactly two anticoagulants and exactly two lipid-lowering drugs. This means that the proportional prescription rate for the second drug is necessarily one minus the proportional prescription rate for the first. If your real data has more drugs in classes, then you will need to modify the code to calculate each proportional prescription rate as the number of prescriptions for the drug divided by the sum of the number of prescriptions for each drug in the class. Also, if the number of drugs in a class is large, you might want to first calculate the total by using -egen, rowtotal()- instead of writing out the sum over and over again.

                Finally, in the future, if your ID variable is a string, don't post a data example in which the id variable is numeric. That's just asking for trouble. The whole point of using -dataex- is that it gives those who want to help you a data set to work with that is exactly like your real data, so that code that works in the example will work in your real data. Of course there are some aspects of the real data that cannot be shown in example; if nothing else the sample size in the example will typically be be much smaller than the real data. But data storage types are really important: Stata treats string and numeric variables quite differently in many commands. Even the difference between float and double in numeric variable can require different code to work properly. Don't change data storage types or labeling when posting examples: they often matter!

                Comment


                • #9
                  Thank so much prof. Schechter.

                  But I still would like to know a prescription rate for each drug.

                  I've applied the following command:
                  collapse (sum) aspirin-fibrate (count) denom = id, by(sex year)
                  foreach v of varlist aspirin-fibrate{
                  gen `v'_rate = `v'/denom
                  }

                  I got the following results.
                  Click image for larger version

Name:	Data ex.png
Views:	1
Size:	17.8 KB
ID:	1430864



                  Here denominators differ year by year. But in real dataset denominators are same in all year (I thought it would be different because number of patients varies each year ). Is there any possibility to solve this problem?



                  Thank you so much

                  Comment


                  • #10
                    How do you expect anyone to figure out why you are getting unexpected results in the real data set when you haven't shown the real data set? I'm happy to troubleshoot the code--but you have to make it possible. Post a new example, using -dataex- that produces incorrect results and I'll try to figure out what the problem is. But all I can say at this point is that the code you have produces correct results with the data you have shown, and it appears to be correct. So perhaps there is something about the real data that I don't understand. But until I see it here, there is nothing I can do to help you.

                    Comment


                    • #11
                      There is some problem replying to my old post so I am posting a new one. This is continuation of my old post. Please refer to link below.

                      https://www.statalist.org/forums/for...cy-table-trend




                      I would like to find prescription rate by year and by sex. As professor Schechter suggested I've used the loops, but there is problem with denominator.

                      input byte(id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex) int str12 first_date last_date
                      1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 2010
                      2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 1 2006 2010
                      3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 2 2007 2009
                      4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 1 2 2006 2010
                      5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 0 1 2007 2009
                      6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 2006 2010
                      7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 2 2006 2009
                      8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 2 2007 2010
                      9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 2006 2010
                      10 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 2008 2010
                      11 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 2006 2009
                      12 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 2 2006 2010
                      13 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 2007 2009
                      14 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 2006 2009
                      end

                      destring first_date last_date, replace
                      reshape long aspirin clopidogrel statin fibrate, i(id) j(year)
                      keep if inrange(year, first_date, last_date)

                      // CALCULATE RATES BY SEX BY YEAR FOR EACH DRUG
                      collapse (sum) aspirin-fibrate (count) denom = id, by(sex year)
                      foreach v of varlist aspirin-fibrate{
                      gen `v'_rate = `v'/denom

                      Click image for larger version

Name:	data1.png
Views:	7
Size:	19.2 KB
ID:	1430889


                      As shown above each year has same denominator (8 for male and 6 for female). But I thought that each year will have different denominator because the number of patients differ from year to year.

                      I would really appreciate if you may find time to help me solve the problem.

                      Thank you.
                      Sincerely,
                      Oyun

                      Comment


                      • #12
                        Oyun, I don't know what you are doing wrong, but I copied the code directly from #1 (including the data input) and ran it with no modification whatsoever, except for adding the closing brace (}) on a separate line after the last line of code to close the -foreach- loop. Here is what the results look like:

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input int year byte sex double(aspirin clopidogrel statin fibrate) long denom float(aspirin_rate clopidogrel_rate statin_rate fibrate_rate)
                        2006 1 1 1 2 1 4       .25      .25        .5       .25
                        2007 1 3 2 3 4 7  .4285714 .2857143  .4285714  .5714286
                        2008 1 0 0 1 3 8         0        0      .125      .375
                        2009 1 2 4 2 3 8       .25       .5       .25      .375
                        2010 1 3 1 2 0 4       .75      .25        .5         0
                        2006 2 0 0 2 1 4         0        0        .5       .25
                        2007 2 2 2 1 2 6  .3333333 .3333333 .16666667  .3333333
                        2008 2 3 2 1 4 6        .5 .3333333 .16666667  .6666667
                        2009 2 1 3 3 1 6 .16666667       .5        .5 .16666667
                        2010 2 2 1 1 2 4        .5      .25       .25        .5
                        end
                        Notice that the denominators are different for different years (and sexes). These results do not match what you show in your screenshot.

                        I don't know what you are doing, but you are not running this code with this data and getting those results.

                        Comment


                        • #13
                          Dear professor Schechter, I think the reason I got the different result is because of "
                          keep if inrange(year, first_date, last_date" command. (It seems I did not run it before loops).
                          As I understood we should run the above (keep if inrange(year, first_date, last_date))command because first need to get rid of additional observations that were created in wide format, am I right or is there any other reason?


                          I am really sorry for inconvenience. In future I will try to be more attentive and accurate.


                          Thank you so much for all your help and generosity.

                          Sincerely,
                          Oyun

                          Comment


                          • #14
                            As I understood we should run the above (keep if inrange(year, first_date, last_date))command because first need to get rid of additional observations that were created in wide format, am I right or is there any other reason?
                            You are absolutely right.

                            I am really sorry for inconvenience.
                            No problem. No apologies needed.

                            Comment


                            • #15
                              Dear all,

                              I hope everything going well with you.
                              This time I am having trouble in presenting the incidence rate of disease (ex: Myocardial infarction) and medication trends in one graph.
                              Is it possible?

                              I would really appreciate if you may help me to solve the problem.

                              Thank you in advance.

                              Data looks as below.

                              input byte(id aspirin clopidogrel heparin lipid_lowering fibrate) int age str6 supply_date date_at_diagnosis date_of_death
                              1 1 0 1 0 0 65 2002 2005 2006
                              1 0 0 0 1 1 55 2003 2005 0
                              1 0 0 1 0 1 55 2003 0 0
                              1 0 0 0 0 1 75 2002 0 0
                              2 1 1 0 0 1 65 2003 2008 2008
                              2 0 1 0 0 0 66 2007 0 0
                              2 0 1 0 1 0 76 2006 0 0
                              3 0 0 0 1 1 76 2009 0 2010
                              3 1 0 0 0 0 46 2010 0 0
                              3 1 1 0 0 0 56 2011 2009 0
                              3 1 1 1 1 0 46 2008 0 0
                              3 0 0 0 1 0 56 2008 2009 0
                              4 0 0 1 0 1 55 2011 0 2012
                              4 0 0 0 0 1 75 2011 0 0
                              5 1 1 0 0 0 65 2003 2008 2008
                              5 0 1 0 0 1 66 2007 0 0
                              5 0 1 0 1 0 76 2006 2006 0
                              end

                              Comment

                              Working...
                              X