Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flexible number format

    I am converting data from an external source and have to guess an appropriate Stata variable format automatically. Neither f nor g format seems to be a good fit for my task, but perhaps one can advise. The problem is that I don't know the overall width of the number, but just the number of decimals. For example, 2 digits after comma. I do want trailing zeroes to appear. So for a list of numbers 1, 1.03, 11.36, 1000.9, 0.04, 0.2 I wish to see:
    Code:
       1.00
       1.03
      11.36
    1000.90
       0.04
       0.20
    I can specify something like %25.2f, and it works fine in the list command, but the columns will be very wide and inconvenient in browse window. See for example the variable headroom formatted with %25.2f here:
    Click image for larger version

Name:	Formats.png
Views:	1
Size:	90.3 KB
ID:	1343536



    I do prefer the list's behavior over browse's behavior as it produces more compact and usable output. Ideally, I would like to specify something like: %*.2f, which is to say determine the width as needed, and always show 2 decimals. Is there anything like this I could employ?

    Thank you, Sergiy Radyakin
    Last edited by Sergiy Radyakin; 01 Jun 2016, 10:52. Reason: added tags

  • #2
    So, if the largest value of your variable is of the order of 10M, you need M characters for its integer part, 2 more for the two decimal places, and 1 for the decimal point. (I guess if it can be negative you need another one for the sign, but let's leave that out for illustrative purposes.)

    Code:
    summ x, meanonly
    local m = ceil(log(`r(max)')/log(10) )+ 3
    format x %`m'.2f

    Comment


    • #3
      That can be

      Code:
       
       ceil(log10(r(max)))+ 3

      Comment


      • #4
        Dear Clyde, Nick, thank you for your advice.

        1) Perhaps I was not clear in my question. The detail that complicates things is that the r(max) will not be known at the time my program runs.
        It has to decide on the formatting of the variables, then another stream will be populating it with data. So, unfortunately, I can't tell what the width I will need based on the max, but the number of decimals is known for every column. Basically, I am looking for the way to tell Stata exactly the same things that you wrote, but to be performed automatically when it runs, not pre-computed and hardwired into the data file. Also if the user types list in 1/10 only the values in 1/10 affect the width of the column, same I would expect from browse, but it uses the variable's format property, hence will not react to the 1/10 limit.

        2) Second thing that I've noticed just now is that when I am using the fixed format and the value is large, so that it causes the scientific notation the number of decimals after comma is applied to mantissa formatting inconsistently. If I specify that I want 0 decimals (e.g. %18.0f format) then the whole mantissa for large numbers is displayed:
        Code:
        1.2345678900e+201
        But if I specify, say, 1 digit (%18.1f format) then only 1 decimal for large numbers is displayed:
        headroom
        Code:
        1.2e+201
        It is not clear to me, why I should be loosing the detailed information for large numbers when the parameter D in %W.Df is responsible for fractional part only. Or if there is justification in it, then why is 0 a special case?

        Click image for larger version

Name:	digits0.png
Views:	1
Size:	3.2 KB
ID:	1343553
        Click image for larger version

Name:	digits1.png
Views:	1
Size:	4.6 KB
ID:	1343554


        Thank you, Sergiy

        Comment

        Working...
        X