Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • kdensity for 10,000 variables

    Hello, I plan to make an illustrative graph to show kdensity graphs for about 10,000. I use the below coding
    [QUOTE]
    forvalues j = 1(1)10000 {

    local call `call' (kdensity norm if id == `j' , legend(off)) ||
    }
    twoway `call' /QUOTE]

    However, twoway returns to me saying there are two many graphs. Is there any other way that I can make a joint kdensity graph for such many variables?

    Thank you.

  • #2
    I'm curious—what do you expect a graph with 10 000 diverse kernel density plots to look like?

    Comment


    • #3
      Hi, I just want to graphically show that the 10,000 variables should all have a skewed distribution to the left. Is there any suggestion that I should look at to do so?

      I tried to run with 50 variables and it worked (so I got a bold, bunched set of density lines depicting the 50 variables' distributions. It doesn't look nice, but at least works to represent what I want.

      Comment


      • #4
        Originally posted by Canh Dang View Post
        I just want to graphically show that the 10,000 variables should all have a skewed distribution to the left. Is there any suggestion that I should look at to do so?
        Maybe something like the following.
        Code:
        set type double
        collapse (mean) mean = norm (median) median = norm, by(id) // With 10 000 IDs, you might need to be patient here
        generate double delta = mean - median
        summarize delta
        You can plot summary statistics of the delta, too, if that will help you make your case, although I double that such a plot would satisfyTufte's data-ink ratio criterion.

        You could also look into something like
        Code:
        quietly tabstat norm, by(id) statistics(skewness) save
        tempname Skew
        matrix define `Skew' = r(StatTotal)
        drop _all
        svmat double `Skew'
        Again, you might need to be patient.

        Comment


        • #5
          Note that (mean MINUS median) / SD is bounded and falls in [-1, 1]. It is an underused measure of skewness, in my view.

          So, it's easy enough to get an overall idea of 10,000 values of skewness that way:

          Code:
          egen mean = mean(norm), by(id)
          egen median = median(norm), by(id)
          egen SD = sd(norm), by(id) 
          gen skew = (mean - median) / SD 
          egen tag = tag(id) 
          
          quantile skew if tag
          Last edited by Nick Cox; 30 Dec 2018, 03:38.

          Comment


          • #6
            Here's a self-contained example. It's a little silly but it shows some technique

            Code:
            webuse grunfeld, clear
            
            * ssc install rangestat 
            rangestat (mean) mean=invest (median) med=invest (sd) sd=invest, int(year 0 0)
            
            gen skewness = (mean - med) / sd if company == 1
            
            quantile skewness, scheme(s1color)
            
            su skewness
            
                Variable |        Obs        Mean    Std. Dev.       Min        Max
            -------------+---------------------------------------------------------
                skewness |         20    .4252359    .0330337   .3626413   .4761327
            Here there are 10 companies and 20 years. Populating observations for company 1 only is a way of getting just 20 skewness values in a variable. (It's what egen, tag() would do, in essence.

            Comment


            • #7
              Thank you both Nick and Joseph. I've decided to slightly follow your advice, manually calculate skewness values for each variable then plot them to a paragraph. Only need to show them as negative.

              Comment

              Working...
              X