Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bar chart not sorting

    Hello!

    I have some data that looks like this
    index d__bacteriap__firmicutes sort d__bacteriap__bacteroidota d__bacteriap__actinobacteriota d__bacteriap__verrucomicrobiota d__bacteriap__fusobacteriota d__bacteriap__proteobacteria d__bacteriap__desulfobacterota d__archaeap__euryarchaeota d__bacteriap__cyanobacteria d__bacteriap__synergistota d__bacteriap__patescibacteria d__bacteriap__campylobacterota d__bacteriap__elusimicrobiota d__bacteriap__spirochaetota
    1 3.56E+11 1 2.69E+11 3.54E+10 2.48E+10 0 6.74E+09 7.38E+09 0 0 0 1.38E+09 0 0 0
    2 6.19E+11 2 8.45E+10 1.02E+11 0 0 8.26E+08 5.62E+08 0 0 0 0 0 0 0
    3 6.21E+11 3 1.52E+11 2.16E+10 1.03E+09 0 1.79E+09 7.41E+09 0 1.27E+09 0 1.55E+09 0 0 0
    4 8.68E+11 4 4.60E+10 7.78E+10 1.32E+09 0 1.42E+08 0 0 0 0 0 0 0 0
    (sorry I am unable to do dataex since "input statement exceeds linesize limit. Try specifying fewer variables")

    And I am trying to run...
    Code:
    graph bar (asis) d__bacteriap__firmicutes d__bacteriap__bacteroidota d__bacteriap__actinobacteriota d__bacteriap__proteobacteria d__bacteriap__verrucomicrobiota d__bacteriap__fusobacteriota d__bacteriap__desulfobacterota d__archaeap__euryarchaeota d__bacteriap__cyanobacteria d__bacteriap__synergistota d__bacteriap__patescibacteria d__bacteriap__campylobacterota d__bacteriap__elusimicrobiota, over(sort, sort(sort) gap(0) axis (off)) percentages stack bargap(0) ytitle(Relative abundance (%)) ytitle(, size(small) alignment(top)) ylabel(#5, labsize(small) angle(horizontal)) title(Relative abundance (%) of bacteria phyla, size(small)) legend(on nocolfirst notextfirst stack rows(3) keygap(zero) size(small) nobox linegap(minuscule) bexpand title(Legend, size(small) position(11) nobox) span) clegend(on)
    ...to generate a stacked barchart. I would like my data to be sorted by my "d__bacteriap__firmicutes" variable, which is why I generated the "sort" variable that specifies the order each bar should be plotted.

    I want my plot to look like this
    Click image for larger version

Name:	Screenshot 2023-09-15 at 17.08.44.png
Views:	1
Size:	22.5 KB
ID:	1727279


    But no matter what I do, my plot looks like this (it is not sorting!)
    Click image for larger version

Name:	Screenshot 2023-09-15 at 17.09.38.png
Views:	1
Size:	39.3 KB
ID:	1727280


    I have tried the code above, as well as:

    Code:
    graph bar (asis) d__bacteriap__firmicutes d__bacteriap__bacteroidota d__bacteriap__actinobacteriota d__bacteriap__proteobacteria d__bacteriap__verrucomicrobiota d__bacteriap__fusobacteriota d__bacteriap__desulfobacterota d__archaeap__euryarchaeota d__bacteriap__cyanobacteria d__bacteriap__synergistota d__bacteriap__patescibacteria d__bacteriap__campylobacterota d__bacteriap__elusimicrobiota, over(index, sort(d__bacteriap__firmicutes) gap(0) axis (off)) percentages stack bargap(0) ytitle(Relative abundance (%)) ytitle(, size(small) alignment(top)) ylabel(#5, labsize(small) angle(horizontal)) title(Relative abundance (%) of bacteria phyla in the main trial (N=78), size(small)) legend(on nocolfirst notextfirst stack rows(3) keygap(zero) size(small) nobox linegap(minuscule) bexpand title(Legend, size(small) position(11) nobox) span) clegend(on)
    and various combinations of "over" and "sort".

    Thanks for your help!

  • #2
    To get more help on principles, follow the advice of dataex and give us a data example based on say 3 to 5 variable. And explain your criterion for sorting.

    Comment


    • #3
      thank you! Here are the first 5 variables and 5 IDs

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str5 index double d__bacteriap__firmicutes float sort double(d__bacteriap__bacteroidota d__bacteriap__actinobacteriota)
      "1" 356216253410 1 268860203445  35405926030
      "2" 618841390027 2  84527213870 102328230621
      "3" 621153615149 3 152036403024  21563536386
      "4" 868193065951 4  46021603643  77829337828
      "5" 963534398246 5  85292976302 150023127938
      end
      label values sort sort
      label def sort 1 "1", modify
      label def sort 2 "2", modify
      label def sort 3 "3", modify
      label def sort 4 "4", modify
      label def sort 5 "5", modify
      I want to sort by the values of d__bacteriap__firmicutes. I generated a variable called "sort" which numbers the IDs in the order I want them to be plotted using this code:

      Code:
      generate sort=., after(d__bacteriap__firmicutes)
      sort d__bacteriap__firmicutes
      replace sort= _n
      labmask sort, values(index)
      But when I use this variable in my 'graph bar' code above
      Code:
       over(index, sort(sort) gap(0) axis (off))
      , it is not plotting in the order I want it to.

      Thank you for your help!
      Last edited by Sabrina Ayoub-Charette; 18 Sep 2023, 08:10.

      Comment


      • #4
        Thanks for the data example! Putting together code from #1 and #3 and simplifying out some contradictory (*) and unneeded details, I got this:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str5 index double(d__bacteriap__firmicutes  d__bacteriap__bacteroidota d__bacteriap__actinobacteriota)
        "1" 356216253410  268860203445  35405926030
        "2" 618841390027   84527213870 102328230621
        "3" 621153615149  152036403024  21563536386
        "4" 868193065951  46021603643  77829337828
        "5" 963534398246   85292976302 150023127938
        end
        
        sort d__bacteriap__firmicutes
        gen sort= _n
        labmask sort, values(index)
        
        graph bar (asis) d__bacteriap__firmicutes d__bacteriap__bacteroidota d__bacteriap__actinobacteriota, over(index, sort(d__bacteriap__firmicutes) gap(0) axis (off)) percentages stack bargap(0) ytitle(Relative abundance (%)) ytitle(, size(small) alignment(top)) ylabel(#5, labsize(small) angle(horizontal)) title(Relative abundance (%) of bacteria phyla in the main trial (N=78), size(small)) legend(on nocolfirst notextfirst stack rows(3) keygap(zero) size(small) nobox linegap(minuscule) bexpand title(Legend, size(small) position(11) nobox) span) clegend(on)
        
        egen double total = rowtotal(d*)
        gen pc_firmicutes = 100 * d__bacteriap__firmicutes / total 
        
        l sort *firmicutes 
        
             +-----------------------------+
             | sort   d__bact~s   pc_fir~s |
             |-----------------------------|
          1. |    1   3.562e+11   53.93274 |
          2. |    2   6.188e+11   76.80822 |
          3. |    3   6.212e+11   78.15676 |
          4. |    4   8.682e+11   87.51558 |
          5. |    5   9.635e+11   80.37152 |
             +-----------------------------+
        The issue is this: the graph is sorted on the absolute amount of firmicutes, as you asked. But what you see is a graph according to percentages, as you also asked, and the order on absolute firmicutes and on percent ditto is not exactly the same.

        There are various ways around this, and one is to calculate the percentages before you ask for the graph.

        (*) Your code first inputs the variable sort, but then tries to generate it. In practice you generated sort and then ran dataex, which is OK, but the generate won't work in the code you presented.

        labmask is from the Stata Journal. You're asked to indicate where community-contributed commands you use come from.


        Comment


        • #5
          Thank you so much, it worked! And I understand my error now.

          Many thanks!

          Comment

          Working...
          X