different color per person

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#1

different color per person

20 Oct 2018, 21:47

Dear Stata users,
I am looking at eGFR over time and would like graph individual eGFR trajectories using different colours to each person (number of person in each group varies from 300 to 2000). I am wondering how can I do this on STATA. I have searched for a solution everywhere, but could not find an answer.

I would like to create something like this.

Any help would be greatly appreciated. Thank you
Sincerely,
Oyun
Tags: None
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#2

22 Oct 2018, 01:32

Oyun, see the presentations from the 2013 and 2018 Stata user group meetings:
"Strategy and tactics for graphic multiples in Stata"
and
"Strategy and tactics for graphic multiples in Stata"
both by Nick Cox.

IMHO the graph you show is not conveying any usable information. With thousands of lines you are just having a colored mess.

Best, Sergiy
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3458

22 Oct 2018, 02:30

Code:

use https://www.rug.nl/ggdc/docs/pwt90.dta, clear
gen gdppc = rgdpe/pop

bys year : egen tgdp = total(rgdpe)
bys year : egen tpop = total(pop)
gen mgdpc = tgdp / tpop if countrycode == "USA"

sort country year

set scheme s1mono

twoway line gdppc year, c(L) yscale(log) lcolor(%20)       ///
       ylab(500 1000 5000 10000 50000,                     ///
            format(%9.0gc) angle(0))                       ///
       xlab(1950(10)2010)                               || ///
       line mgdp year ,                                    ///
       lwidth(*2) lpattern(solid) lcolor(black)            ///
       ytitle("real GDP per capita (in 2011 US{c S|})")    ///
       legend(order(1 "individual" "countries"             ///
                    2 "average")                           ///
              pos(4) cols(1) symxsize(*.5) size(*.75)      ///
              region(lcolor(none)))

Click image for larger version

Name: Graph.png
Views: 1
Size: 728.5 KB
ID: 1466901

The obvious difference is that I don't give the different countries different colors. That is intentional. Here we have "only" 182 countries, I am never going to follow individual countries in this graph even if I give them different colors. The purpose of showing the individual countries is to give an impression of the variability.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

22 Oct 2018, 13:09

I'm going to agree with Sergey and Maarten here. We can intuitively tell that each line is an cluster (e.g. a person, a country). You generally can't pick out individual trajectories in a spaghetti plot, so I agree that coloring each individual trajectory differently doesn't add any information. I will endorse Maarten's solution in general.

If you see page 69 of Nick Cox's 2018 slides, that approach of graphing each trajectory separately with all other trajectories in the backdrop is likely to be infeasible for a clinical dataset where you have a lot of people. However, you could perform that with some selected trajectories, if you find any that are interesting (e.g. people whose eGFR declines slowly vs declines fast). You could also make their background lines a different color.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#5

23 Oct 2018, 00:45

Wow, this looks very nice! Thank you so much Sergiy, Maarten and Weiwen.

I will try this.

Sincerely,
Oyun
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#6

23 Oct 2018, 02:23

Thanks for the references to two of my talks in #2 and #4 but there is some small confusion.

A 2013 talk "Strategy and tactics for graphic multiples in Stata" can be found via https://www.stata.com/meeting/uk13/abstracts/

The version cited by Weiwen is an unauthorised copy. When I click on it I see advertisements too and other stuff completely unrelated to anything I've done. Bizarre.

A 2018 talk "Spaghetti, paella, and alternatives: Graphics for multiple series and groups" can be found via https://www.stata.com/meeting/uk18/

That aside, I agree with the idea that different colours would be futile here, but many thin lines in the same colour can helpfully underline variability.
1 like
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#7

23 Oct 2018, 17:51

Thank you so much Nick.

I've tried the Maarten's code and it created the following graph (slightly different). Stata says: (note: named style % 20 not found in class color, default attributes used)
I am wondering why this may be?

Many thanks.
Sincerely,
Oyun

Last edited by Buyadaa Oyunchimeg; 23 Oct 2018, 18:36.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#8

23 Oct 2018, 18:25

Which version of Stata are you using? Transparency was introduced in Stata 15, as you should be able to read in

www.stata.com/help.cgi?whatsnew14to15

regardless of whether you have Stata 15 installed.

Recall the advice in the FAQ at https://www.statalist.org/forums/help#version

11. What should I say about the version of Stata I use?

The current version of Stata is 15.1. Please specify if you are using an earlier version; otherwise, the answer to your question may refer to commands or features unavailable to you. Moreover, as bug fixes and new features are issued frequently by StataCorp, make sure that you update your Stata before posting a query, as your problem may already have been solved.

While you are there, scroll down to #18 too, on Stata not STATA.

In Stata <15, try something like

Code:

lcolor(gs12) lw(vthin)

rather than

Code:

lcolor(%20)
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#9

23 Oct 2018, 18:37

Thanks Nick. It works!

I am using Stata 14.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#10

24 Oct 2018, 20:04

When I applied Maarten's code on real data it creates the following graph. I am wondering:

1. How can I deal with overlapping mean.
2. How to extend distance between x-axis labels to give more impression of the variability at different time points.
(as shown in -dataex number of visits varies between individuals)
3. Is it possible to create small dots at each time point on Stata 14 as shown on post #3 (Maarten's post).

I would really appreciate anyone may help me.

Thank you so much.

Data looks as below:

PHP Code:

[CODE] * Example generated by -dataex-. To install: ssc install dataex clear input long ID double gfr float numeric_visit int days_from_baseline float(t_years gr) 1 53.7 0 0 0 4 1 43.8 4 115 .3150685 4 1 50.4 12 349 .9561644 4 1 46.7 24 731 2.0027397 4 1 38.1 36 1101 3.016438 4 1 46.4 48 1452 3.978082 4 1 39 72 2187 5.991781 4 1 34.1 84 2558 7.008219 4 2 59.8 0 0 0 4 2 54.9 4 122 .3342466 4 2 54.8 12 367 1.0054795 4 2 43.6 24 731 2.0027397 4 2 46.7 28 850 2.328767 4 2 50.3 32 962 2.6356165 4 2 49.9 36 1102 3.019178 4 2 48.3 40 1214 3.3260274 4 2 49.4 44 1326 3.6328766 4 2 57.5 48 1459 3.99726 4 2 50.8 52 1585 4.342466 4 2 53.2 56 1690 4.630137 4 2 52.3 60 1830 5.013699 4 2 49.9 64 1949 5.339726 4 2 55.8 68 2053 5.624658 4 2 49.8 72 2192 6.005479 4 end [/CODE]

PHP Code:

***to generate mean gfr over time I've used the following code: gen range=. replace range=1 if (t_years>=0 & t_years<=0.999) replace range=2 if (t_years>=1.0 & t_years<=1.999) replace range=3 if (t_years>=2.0 & t_years<=2.999) replace range=4 if (t_years>=3.0 & t_years<=3.999) replace range=5 if (t_years>=4.0 & t_years<=4.999) replace range=6 if (t_years>=5.0 & t_years<=5.999) replace range=7 if (t_years>=6.0 & t_years<=6.999) replace range=8 if (t_years>=7.0 & t_years<=7.999) bysort gr range: egen avg=mean(gfr) * to create graph set scheme s1mono twoway line gfr t_years if gr==4, c(L) yscale(log) lcolor(gs12) lw(vthin) /// ylab(20 25 30 35 40 45 50 55 60 65 70 75 80 85 90, /// format(%9.0gc) angle(0)) /// xlab(0(1)8) || /// line avg t_years if gr==4 , /// lwidth(*2) lpattern(solid) lcolor(black) /// ytitle("eGFR (ml/min per 1.73 m2") /// legend(order(1 "individual" "eGFR" /// 2 "average") /// pos(4) cols(1) symxsize(*.5) size(*.75) /// region(lcolor(none)))

Graph looks as below ( group 4 which has the smallest number of people, n~400)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3458
#11

25 Oct 2018, 01:22

To solve (1) replace:

Code:

line avg t_years if gr==4 , ///

with

Code:

line avg t_years if gr==4 , sort ///

I don't understand (2)

The small dots are actually an artifact, but you can create them by replacing:

Code:

twoway line gfr t_years if gr==4, c(L) ...

with

Code:

twoway connected gfr t_years if gr==4, c(L)

You may want to tweak the symbols with the msymbol() and mcolor() options

An easier way to create range is:

Code:

gen range = floor(t_years) + 1

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#12

25 Oct 2018, 02:32

On a different note: while log scale is surely the right thing for Maarten's data, it is not obviously helping much here. But much depends on whether low or high values are of clinical interest, on which I am ignorant.
Comment

Announcement