kdensity for 10,000 variables

Canh Dang

Join Date: Apr 2015

Posts: 29
#1

kdensity for 10,000 variables

29 Dec 2018, 17:59

Hello, I plan to make an illustrative graph to show kdensity graphs for about 10,000. I use the below coding
[QUOTE]
forvalues j = 1(1)10000 {

local call `call' (kdensity norm if id == `j' , legend(off)) ||
}
twoway `call' /QUOTE]

However, twoway returns to me saying there are two many graphs. Is there any other way that I can make a joint kdensity graph for such many variables?

Thank you.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#2

29 Dec 2018, 18:16

I'm curious—what do you expect a graph with 10 000 diverse kernel density plots to look like?
Comment
Canh Dang

Join Date: Apr 2015

Posts: 29
#3

29 Dec 2018, 18:30

Hi, I just want to graphically show that the 10,000 variables should all have a skewed distribution to the left. Is there any suggestion that I should look at to do so?

I tried to run with 50 variables and it worked (so I got a bold, bunched set of density lines depicting the 50 variables' distributions. It doesn't look nice, but at least works to represent what I want.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#4

29 Dec 2018, 18:56

Originally posted by Canh Dang View Post

I just want to graphically show that the 10,000 variables should all have a skewed distribution to the left. Is there any suggestion that I should look at to do so?

Maybe something like the following.

Code:

set type double collapse (mean) mean = norm (median) median = norm, by(id) // With 10 000 IDs, you might need to be patient here generate double delta = mean - median summarize delta

You can plot summary statistics of the delta, too, if that will help you make your case, although I double that such a plot would satisfyTufte's data-ink ratio criterion.

You could also look into something like

Code:

quietly tabstat norm, by(id) statistics(skewness) save tempname Skew matrix define `Skew' = r(StatTotal) drop _all svmat double `Skew'

Again, you might need to be patient.
3 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

30 Dec 2018, 03:19

Note that (mean MINUS median) / SD is bounded and falls in [-1, 1]. It is an underused measure of skewness, in my view.

So, it's easy enough to get an overall idea of 10,000 values of skewness that way:

Code:

egen mean = mean(norm), by(id) egen median = median(norm), by(id) egen SD = sd(norm), by(id) gen skew = (mean - median) / SD egen tag = tag(id) quantile skew if tag

Last edited by Nick Cox; 30 Dec 2018, 03:38.
2 likes
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

30 Dec 2018, 06:26

Here's a self-contained example. It's a little silly but it shows some technique

Code:

webuse grunfeld, clear

* ssc install rangestat 
rangestat (mean) mean=invest (median) med=invest (sd) sd=invest, int(year 0 0)

gen skewness = (mean - med) / sd if company == 1

quantile skewness, scheme(s1color)

su skewness

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    skewness |         20    .4252359    .0330337   .3626413   .4761327

Here there are 10 companies and 20 years. Populating observations for company 1 only is a way of getting just 20 skewness values in a variable. (It's what egen, tag() would do, in essence.

Comment

Canh Dang

Join Date: Apr 2015

Posts: 29
#7

30 Dec 2018, 13:41

Thank you both Nick and Joseph. I've decided to slightly follow your advice, manually calculate skewness values for each variable then plot them to a paragraph. Only need to show them as negative.
Comment

Announcement

kdensity for 10,000 variables

Comment

Comment

Comment

Comment

Comment

Comment