Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Histogram –Assigning min and max values to bins

    Dear Stata users,
    For my Project, I need the histogram of the intraday returns for each day and save the densities.
    As far as I search, I am able to save the density with twoway__gen_histogram command.
    However, I also need to assign the minimum and maximum numbers to the bins by myself.

    I am not allowed the share the original data but let’s say I am working with the following fake data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(DATE_stata DAY MONTH YEAR TIME_stata HOUR MINUTE SECOND return)
    21306 2 5 2018 3.57e+07  9 55 0 .004712995
    21306 2 5 2018 3.60e+07 10  0 0  .03359007
    21306 2 5 2018 3.63e+07 10  5 0   .0395165
    21306 2 5 2018 3.66e+07 10 10 0 .021177117
    21306 2 5 2018 3.69e+07 10 15 0  .02277919
    21306 2 5 2018 3.72e+07 10 20 0 .033864293
    21306 2 5 2018 3.75e+07 10 25 0    .020484
    21306 2 5 2018 3.78e+07 10 30 0 .018225307
    21306 2 5 2018 3.81e+07 10 35 0  .01916204
    21306 2 5 2018 3.84e+07 10 40 0 .035961933
    21306 2 5 2018 3.87e+07 10 45 0  .02108501
    21306 2 5 2018 3.90e+07 10 50 0 .029303394
    21306 2 5 2018 3.93e+07 10 55 0 .005171665
    21306 2 5 2018 3.96e+07 11  0 0 .002619719
    21306 2 5 2018 3.99e+07 11  5 0 .003468564
    21306 2 5 2018 4.02e+07 11 10 0  .02528198
    21306 2 5 2018 4.05e+07 11 15 0  .03248618
    21306 2 5 2018 4.08e+07 11 20 0  .02698752
    21306 2 5 2018 4.11e+07 11 25 0   .0193095
    21306 2 5 2018 4.14e+07 11 30 0  .01664686
    21306 2 5 2018 4.17e+07 11 35 0  .00985556
    21306 2 5 2018 4.20e+07 11 40 0 .031419493
    21306 2 5 2018 4.23e+07 11 45 0 .008919027
    21306 2 5 2018 4.26e+07 11 50 0  -.0100898
    21306 2 5 2018 4.29e+07 11 55 0  .03562382
    21306 2 5 2018 4.32e+07 12  0 0  .01907039
    21306 2 5 2018 4.35e+07 12  5 0  .03053173
    21306 2 5 2018 4.38e+07 12 10 0  .02947596
    21306 2 5 2018 4.41e+07 12 15 0 .003836877
    21306 2 5 2018 4.44e+07 12 20 0 .032339774
    21306 2 5 2018 4.47e+07 12 25 0 .022225296
    21306 2 5 2018 4.50e+07 12 30 0  .03634135
    21306 2 5 2018 4.53e+07 12 35 0 .005161573
    21306 2 5 2018 4.56e+07 12 40 0 .036354125
    21306 2 5 2018 4.59e+07 12 45 0 .007547986
    21306 2 5 2018 4.62e+07 12 50 0  .03292318
    21306 2 5 2018 4.65e+07 12 55 0  .03106813
    21306 2 5 2018 4.68e+07 13  0 0 .020866474
    21306 2 5 2018 4.71e+07 13  5 0 .001889656
    21306 2 5 2018 4.74e+07 13 10 0  .03760757
    21306 2 5 2018 4.77e+07 13 15 0 .032291573
    21306 2 5 2018 4.80e+07 13 20 0  .03772156
    21306 2 5 2018 4.83e+07 13 25 0 .017582364
    21306 2 5 2018 4.86e+07 13 30 0  .03873967
    21306 2 5 2018 4.89e+07 13 35 0  .01911121
    21306 2 5 2018 4.92e+07 13 40 0  .03079806
    21306 2 5 2018 4.95e+07 13 45 0 .031644672
    21306 2 5 2018 4.98e+07 13 50 0  .02339589
    21306 2 5 2018 5.01e+07 13 55 0          .
    21306 2 5 2018 5.04e+07 14  0 0          .
    end
    format %tdDD/NN/CCYY DATE_stata
    format %tc TIME_stata
    when I check the range with
    Code:
    sum return
    the output is

    Variable | Obs Mean Std. Dev. Min Max
    -------------+--------------------------------------------------------
    return | 48 .0226268 .0122959 -.0100898 .0395165

    As you can see , range of return is [-.0100898 .0395165]. But methodology that I followed requires that first bin should start with the -.01 and last bin should end with 0.035, and width of each bin should be 0.001. Consequently there should be (0.035 – (-0.01)) / 0.001 = 45 bins. And the values less than -0.01 and more than .035 should be counted as tails. In other words what I want is to have:
    Lower tail : less than -.01
    Bin-1: between -0.01 and -0.009
    Bin-2: between -0.009 and -0.008
    Bin-3: between -0.008 and -0.007

    Bin-45: between 0.034 and 0.035
    Upper tail: more than 0.035

    There is a start option for twoway__gen_histogram command (and for histogram command as well) But I think these are not what I am looking for, because here https://www.stata.com/manuals13/rhistogram.pdf it is said start(#), if specified, must be less than or equal to m (minimum value of the variable), or else an error will be issued.

    Do you have any suggestions? I think that there is an easy way to do it but I cannot figure it out. I appreciate any help or opinion.
    Thank you in advance.

    Best
    Merve

  • #2
    Bin-1: between -0.01 and -0.009
    Bin-2: between -0.009 and -0.008
    Bin-3: between -0.008 and -0.007
    So the width of your bin is 0.001. Consider the following:

    Code:
    . di 0.001*floor(-0.0085/0.001)
    -.009
    
    . di 0.001*floor(-0.009/0.001)
    -.009
    
    . di 0.001*floor(-0.0091/0.001)
    -.01

    Comment


    • #3
      Thank you Andrew. I am a little bit late to say though. Forgive me, I didn't have time to check here for a long time.

      Comment

      Working...
      X