Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Histogram with log scale both below and above 0

    Dear Statalisters.

    I have been puzzling over what I would have thought would be a simple challenge. I have an index which has a distribution that is pareto-like on both sides of the x-axis. It is distributed over the interval [-1;1], with a mass point at zero, and most of the distribution very close to zero (either positive or negative) with a very flimsy tail extending to 1 or -1. I wanted to produce an histogram in the spirit of the screenshot I have attached (which I believe was produced in R, but not by me). But using xscale(log) does not seem to work.

    Here is a way to generate a toy dataset with something that looks like the index:

    set obs 10000
    set seed 876
    gen u=runiform()
    gen v=runiform()
    gen w1=runiform()
    gen w2=runiform()
    gen t=runiform()
    gen double ypareto = 1000/(u^2.1)
    quietly sum ypareto
    gen y=ypareto/`r(max)'
    replace y=0 if w1<0.025
    gen double zpareto = 1000/(v^2.1)
    quietly sum zpareto
    gen z=zpareto/`r(max)'
    replace z=0 if w2<0.025
    gen d=y
    replace d=-z if t>0.5
    keep d
    sum d, d
    twoway histogram d, bin(100)
    twoway histogram d, bin(100) xscale(log)

    If someone could point me in the general direction to produce a graph like this, I would be grateful.

    Regards,

    Pierre
    Attached Files

  • #2
    You don't give a reference and it is natural to wonder why this scale isn't explained in your source (otherwise why would you be asking?).

    It clearly isn't a logarithmic scale because log 0 is undefined and log of negative numbers is complex and not plottable in this way.

    Nor is it -- as some numerical experiments confirm --

    Code:
    sign(x) * log(1 + abs(x))
    or

    Code:
    asinh(x)
    which accommodate negative, zero and positive values symmetrically both mapping zero to zero and behaving like log(x) for x >> 0 and -log(-x) for x << 0.

    What I see is that 0.001 0.01 0. 1 and their negatives both appear logarithmically spaced, yet the distance between -0.001 and 0.001 on the scale appears twice the step between other tick marks.

    One candidate for this is a hybrid of linear and logarithmic, as exemplified by this code:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float x
        -1
       -.1
      -.01
     -.001
    -.0005
         0
     .0005
      .001
       .01
        .1
         1
    end
    
    gen transf = cond(x >= 0.001, log10(x) + 4, cond(x <= -0.001, -log10(-x) -4, 1000 * x))
    
    l, sep(0)
    
         +-----------------+
         |      x   transf |
         |-----------------|
      1. |     -1       -4 |
      2. |    -.1       -3 |
      3. |   -.01       -2 |
      4. |  -.001       -1 |
      5. | -.0005      -.5 |
      6. |      0        0 |
      7. |  .0005       .5 |
      8. |   .001        1 |
      9. |    .01        2 |
     10. |     .1        3 |
     11. |      1        4 |
         +-----------------+
    The graph makes me wonder if there is a simpler algebraic version for something very similar, with different constants in my first two functions
    EDIT: In fact further experiments suggest asinh(k x) for k say 1000 or more.
    Attached Files
    Last edited by Nick Cox; 23 Mar 2019, 01:30.

    Comment

    Working...
    X