Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing percentages as line graph for categorical variable


    Hi all,

    I have a variable, "tx_order" with contains 7 categories of a treatment. A second category "YEAR_OF_DIAGNOSIS" indicated the year administered.

    I'm trying to plot the percentage of each category per year in a line graph. Where I have a line for each category depicting the percentages - see below. For example, in 2004 "Surgery" would be 34.55%, "Radiation" 0%, etc.

    Tabulation:

    Click image for larger version

Name:	Screen Shot 2020-10-02 at 15.38.24.png
Views:	1
Size:	22.1 KB
ID:	1575315


    I created these graphs with binary variables, using the code below. However, I don't know how to do it for a non-binary variable, like the one described above.

    tab chemo YEAR_OF_DIAGNOSIS, col
    by YEAR_OF_DIAGNOSIS, sort: egen pc_chemo = mean(100*chemo)
    label variable pc_chemo "Chemotherapy"

    *Graph
    twoway (connected pc_chemo YEAR_OF_DIAGNOSIS), ///
    xtitle(Year) xlabel(#14) ///
    ytitle(Received Radiation (%)) ylabel (0(10)100)






    I would really appreciate your help!
    Attached Files
    Last edited by Roberto Vidri; 02 Oct 2020, 13:49.

  • #2
    I have exactly the same question. I do not want to create a graph bar or stack bar because it looks too "crowded".
    I have 2 categorical variables:
    a. Year: 2014, 2015, 2016, 2017, 2018
    b. Types of Malformation: cardiac, orofacial, etc (11 in total).

    Attached is my table.
    I will appreciate the help.
    Thanks

    Click image for larger version

Name:	table.png
Views:	1
Size:	111.4 KB
ID:	1575908


    Comment


    • #3
      These questions interest many people, but I guess the main reason #1 didn't get an answer was the absence of a data example we can use easily, and #2 is no different. Please see FAQ Advice #12 for an explanation of why screenshots (images) are not as helpful as you hope, and what to do instead.

      #1 has 7 rather queasy medical categories -- in my view as never more than a patient -- over 4 years in the table and #2 11 categories over 5 years. So, let's fake some data of the second size to make things as realistic as possible.

      A glance at both tables shows a very common pattern.

      There are some frequent categories and rather more infrequent categories. which are going to be hard to tease apart on a standard line graph.

      A popular but not always effective remedy is to supply a legend, which takes up a large fraction of the total display and doesn't usually help much.


      The code comes first, and then some commentary.

      Code:
      clear 
      
      set obs 55 
      set seed 2803 
      egen year = seq(), from(2014) to(2018)
      egen cat = seq(), to(11) block(5)
      label def cat 1 algebra 2 biology 3 chemistry 4 drama 5 English 6 forestry 7 geography 8 history 9 idiosyncrasy 10 judo 11 kudos 
      label val cat cat 
      gen freq = (cat  - 4)^2 * runiformint(1, 10)
      
      egen pc = pc(freq), by(year)
      
      * install from SSC 
      * graph 1 
      fabplot line pc year, by(cat)
      
      
      egen median = median(-pc), by(cat)
      egen group = group(median cat)
      
      * install from Stata Journal 
      labmask group, val(cat) decode 
      
      * graph 2 
      fabplot line pc year, by(group, l1title(%, orient(horizontal))) frontopts(lw(thick)) front(connected) xtitle("")
      The first graph tried is a fabplot (front and back plot) with (a) a line graph for each series, in front (b) all the other line graphs, in back.

      See https://www.statalist.org/forums/for...ailable-on-ssc for the fuller story, except that even people who find this interesting will want to skim and skip through a slow and repetitive story


      Click image for larger version

Name:	fabplot_med1.png
Views:	1
Size:	100.7 KB
ID:	1575927

      That's a start, but we can do a lot better.

      1. Alphabetical order is a natural default for Stata graphs, but dopey for showing patterns in the data. We should sort the series by magnitude.

      2. Each individual series needs more emphasis.

      3. A small peeve of mine is that titles like "year" should be cut as obvious. (Other way round, a reader who needs to be told what 2014 to 2018 mean needs even more help than that!)

      4. pc is just a short name I thought up, and there the reader does deserve better.


      We could reorder by hand, but that is not much fun. I chose to order by median and note that negating the median means that the highest median gets rank 1 from egen, group() It is possible that two categories have the same median; if so, ties are broken by the corresponding categories. Then we have a little deal to get the value labels of the original categorical variable copied over to be the value labels of the new ordered categorical variable. That is what labmask does. The slightly whimsical name comes from the idea that we give a variable new value labels to be worn like a mask; the mask is what you see.


      Click image for larger version

Name:	fabplot_med2.png
Views:	1
Size:	102.5 KB
ID:	1575928


      You see that idiosyncrasy scores higher than kudos, but there you go.

      Comment


      • #4
        Thank you so much for the asnwer Dr. Cox. I will try this code. Also sorry about the picture I will be more careful next time.

        Comment


        • #5
          Thank you for your post and help, Dr. Cox

          Comment


          • #6
            I want to create graph like

            I want to see the percentage of tretment 1 and 2 over time

            i have the time ver as D14-D30-D45.. separately

            and tretment ver coded as 1 and 2
            Click image for larger version

Name:	IMG_0863.png
Views:	1
Size:	34.9 KB
ID:	1734834

            Comment

            Working...
            X