Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • different color per person

    Dear Stata users,
    I am looking at eGFR over time and would like graph individual eGFR trajectories using different colours to each person (number of person in each group varies from 300 to 2000). I am wondering how can I do this on STATA. I have searched for a solution everywhere, but could not find an answer.

    I would like to create something like this.



    Click image for larger version

Name:	Fig 1.png
Views:	3
Size:	273.7 KB
ID:	1466781



    Any help would be greatly appreciated. Thank you
    Sincerely,
    Oyun

  • #2
    Oyun, see the presentations from the 2013 and 2018 Stata user group meetings:
    "Strategy and tactics for graphic multiples in Stata"
    and
    "Strategy and tactics for graphic multiples in Stata"
    both by Nick Cox.

    IMHO the graph you show is not conveying any usable information. With thousands of lines you are just having a colored mess.

    Best, Sergiy

    Comment


    • #3
      Code:
      use https://www.rug.nl/ggdc/docs/pwt90.dta, clear
      gen gdppc = rgdpe/pop
      
      bys year : egen tgdp = total(rgdpe)
      bys year : egen tpop = total(pop)
      gen mgdpc = tgdp / tpop if countrycode == "USA"
      
      sort country year
      
      set scheme s1mono
      
      twoway line gdppc year, c(L) yscale(log) lcolor(%20)       ///
             ylab(500 1000 5000 10000 50000,                     ///
                  format(%9.0gc) angle(0))                       ///
             xlab(1950(10)2010)                               || ///
             line mgdp year ,                                    ///
             lwidth(*2) lpattern(solid) lcolor(black)            ///
             ytitle("real GDP per capita (in 2011 US{c S|})")    ///
             legend(order(1 "individual" "countries"             ///
                          2 "average")                           ///
                    pos(4) cols(1) symxsize(*.5) size(*.75)      ///
                    region(lcolor(none)))            
      Click image for larger version

Name:	Graph.png
Views:	1
Size:	728.5 KB
ID:	1466901
      The obvious difference is that I don't give the different countries different colors. That is intentional. Here we have "only" 182 countries, I am never going to follow individual countries in this graph even if I give them different colors. The purpose of showing the individual countries is to give an impression of the variability.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        I'm going to agree with Sergey and Maarten here. We can intuitively tell that each line is an cluster (e.g. a person, a country). You generally can't pick out individual trajectories in a spaghetti plot, so I agree that coloring each individual trajectory differently doesn't add any information. I will endorse Maarten's solution in general.

        If you see page 69 of Nick Cox's 2018 slides, that approach of graphing each trajectory separately with all other trajectories in the backdrop is likely to be infeasible for a clinical dataset where you have a lot of people. However, you could perform that with some selected trajectories, if you find any that are interesting (e.g. people whose eGFR declines slowly vs declines fast). You could also make their background lines a different color.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Wow, this looks very nice! Thank you so much Sergiy, Maarten and Weiwen.


          I will try this.

          Sincerely,
          Oyun

          Comment


          • #6
            Thanks for the references to two of my talks in #2 and #4 but there is some small confusion.

            A 2013 talk "Strategy and tactics for graphic multiples in Stata" can be found via https://www.stata.com/meeting/uk13/abstracts/

            The version cited by Weiwen is an unauthorised copy. When I click on it I see advertisements too and other stuff completely unrelated to anything I've done. Bizarre.

            A 2018 talk "Spaghetti, paella, and alternatives: Graphics for multiple series and groups" can be found via https://www.stata.com/meeting/uk18/

            That aside, I agree with the idea that different colours would be futile here, but many thin lines in the same colour can helpfully underline variability.

            Comment


            • #7
              Thank you so much Nick.

              I've tried the Maarten's code and it created the following graph (slightly different). Stata says: (note: named style % 20 not found in class color, default attributes used)
              I am wondering why this may be?





              Click image for larger version

Name:	Fig 1a.png
Views:	1
Size:	102.9 KB
ID:	1467236




              Many thanks.
              Sincerely,
              Oyun
              Last edited by Buyadaa Oyunchimeg; 23 Oct 2018, 18:36.

              Comment


              • #8
                Which version of Stata are you using? Transparency was introduced in Stata 15, as you should be able to read in

                www.stata.com/help.cgi?whatsnew14to15

                regardless of whether you have Stata 15 installed.

                Recall the advice in the FAQ at https://www.statalist.org/forums/help#version

                11. What should I say about the version of Stata I use?

                The current version of Stata is 15.1. Please specify if you are using an earlier version; otherwise, the answer to your question may refer to commands or features unavailable to you. Moreover, as bug fixes and new features are issued frequently by StataCorp, make sure that you update your Stata before posting a query, as your problem may already have been solved.
                While you are there, scroll down to #18 too, on Stata not STATA.

                In Stata <15, try something like

                Code:
                lcolor(gs12) lw(vthin)
                rather than

                Code:
                lcolor(%20)

                Comment


                • #9
                  Thanks Nick. It works!

                  I am using Stata 14.

                  Comment


                  • #10
                    When I applied Maarten's code on real data it creates the following graph. I am wondering:

                    1. How can I deal with overlapping mean.
                    2. How to extend distance between x-axis labels to give more impression of the variability at different time points.
                    (as shown in -dataex number of visits varies between individuals)
                    3. Is it possible to create small dots at each time point on Stata 14 as shown on post #3 (Maarten's post).


                    I would really appreciate anyone may help me.

                    Thank you so much.

                    Data looks as below:

                    PHP Code:
                    [CODE]
                    Example generated by -dataex-. To installssc install dataex
                    clear
                    input long ID double gfr float numeric_visit int days_from_baseline float
                    (t_years gr)
                    1 53.7  0    0         0 4
                    1 43.8  4  115  .3150685 4
                    1 50.4 12  349  .9561644 4
                    1 46.7 24  731 2.0027397 4
                    1 38.1 36 1101  3.016438 4
                    1 46.4 48 1452  3.978082 4
                    1   39 72 2187  5.991781 4
                    1 34.1 84 2558  7.008219 4
                    2 59.8  0    0         0 4
                    2 54.9  4  122  .3342466 4
                    2 54.8 12  367 1.0054795 4
                    2 43.6 24  731 2.0027397 4
                    2 46.7 28  850  2.328767 4
                    2 50.3 32  962 2.6356165 4
                    2 49.9 36 1102  3.019178 4
                    2 48.3 40 1214 3.3260274 4
                    2 49.4 44 1326 3.6328766 4
                    2 57.5 48 1459   3.99726 4
                    2 50.8 52 1585  4.342466 4
                    2 53.2 56 1690  4.630137 4
                    2 52.3 60 1830  5.013699 4
                    2 49.9 64 1949  5.339726 4
                    2 55.8 68 2053  5.624658 4
                    2 49.8 72 2192  6.005479 4
                    end
                    [/CODE
                    PHP Code:

                    ***to generate mean gfr over time I've used the following code:

                    gen range=.
                    replace range=1 if (t_years>=0 & t_years<=0.999)
                    replace range=2 if (t_years>=1.0 & t_years<=1.999)
                    replace range=3 if (t_years>=2.0 & t_years<=2.999)
                    replace range=4 if (t_years>=3.0 & t_years<=3.999)
                    replace range=5 if (t_years>=4.0 & t_years<=4.999)
                    replace range=6 if (t_years>=5.0 & t_years<=5.999)
                    replace range=7 if (t_years>=6.0 & t_years<=6.999)
                    replace range=8 if (t_years>=7.0 & t_years<=7.999)

                    bysort gr range: egen avg=mean(gfr)


                    * to create graph

                    set scheme s1mono

                    twoway line gfr t_years if gr==4, c(L) yscale(log) lcolor(gs12) lw(vthin)   ///
                           ylab(20 25 30 35 40 45 50 55 60 65 70 75 80 85 90,            ///
                                format(%9.0gc) angle(0))                          ///
                           xlab(0(1)8)                               ||                    ///
                           line avg t_years if gr==4 ,                              ///
                           lwidth(*2) lpattern(solid) lcolor(black)            ///
                           ytitle("eGFR (ml/min per 1.73 m2")               ///
                           legend(order(1 "individual" "eGFR"               ///
                                        2 "average")                                      ///
                                  pos(4) cols(1) symxsize(*.5) size(*.75)     ///
                                  region(lcolor(none))) 


                    Graph looks as below ( group 4 which has the smallest number of people, n~400)
                    Click image for larger version

Name:	Graph gr 4.png
Views:	1
Size:	116.2 KB
ID:	1467430




                    Comment


                    • #11
                      To solve (1) replace:

                      Code:
                      line avg t_years if gr==4 , ///

                      with
                      Code:
                      line avg t_years if gr==4 , sort ///
                      I don't understand (2)

                      The small dots are actually an artifact, but you can create them by replacing:
                      Code:
                      twoway line gfr t_years if gr==4, c(L) ...
                      with

                      Code:
                      twoway connected gfr t_years if gr==4, c(L)
                      You may want to tweak t
                      he symbols with the msymbol() and mcolor() options

                      An easier way to create range is:
                      Code:
                      gen range = floor(t_years) + 1
                      ---------------------------------
                      Maarten L. Buis
                      University of Konstanz
                      Department of history and sociology
                      box 40
                      78457 Konstanz
                      Germany
                      http://www.maartenbuis.nl
                      ---------------------------------

                      Comment


                      • #12
                        On a different note: while log scale is surely the right thing for Maarten's data, it is not obviously helping much here. But much depends on whether low or high values are of clinical interest, on which I am ignorant.

                        Comment

                        Working...
                        X