Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number format: only display digits if non-intenger

    I let STATA display summary statistics for me using esttab, as follows:

    Code:
    sysuse auto, clear
    
    local sumvars mpg headroom gear_ratio
    estpost tabstat `sumvars', s(n mean p50 min max sd) col(stat)
    
    esttab . , ///
    c("mean(fmt(3)) p50(fmt(3)) sd(fmt(3)) min(fmt(3)) max(fmt(3)) count(fmt(0))") label noobs nodepvar nomtitle nonumber replace tex ///
    coeflabels()
    Now, I would like to twist the formatting in a way that a number, say the minimum of one of the variables, is displayed with three digits only if it is NOT an integer. For example, the min of Mileage(mpg) is 12, so I would like esttab to produce "12" rather than "12.000". Similarly, the min of Gear Ratio is 2.190, so there, it is fine to have three digits, because 2.190 is not an integer. Of course I could check this manually, but it would be much faster and safer to have STATA do this for me. Can I tweak the format to achieve this?

  • #2
    esttab is from Stata Journal (FAQ Advice #12). The format option -fmt(#)- specifies the number of decimal places to display. Therefore, you requested for 3 and you got 3 in your example. Therefore, you need

    Code:
    fmt(0)
    or

    Code:
    fmt(%12.0f)

    Comment


    • #3
      Thanks, I know that one, and also use it in my code for the number of observations:
      Code:
       
       count(fmt(0))
      What I want is STATA to decide the number of decimals itself in a clever way, based on whether the specific number is an integer or not. In my example, for mpg it would choose zero digits because all numbers are integers, while for gear_ratio it would choose three, because it is not an integer. You see what I mean?

      Comment


      • #4
        The desire makes sense but it calls for quite different code. You're mapping tabstat output to esttab output and there is nothing in the middle to tweak. That is, you can specify formats for each different statistic in the table but what you also want is for formats to depend on the properties of each variable. There is no hook for that in the code.

        I use tabstat a lot, but its output is rarely exactly what I want. It gives too many decimal places for most of my purposes but if I specify fewer decimal places I have to watch out for the sample size being reported as say 42.0 or 42.00, which just looks silly. This doesn't really matter much as most of the time I don't want to publish the results, so I just round results mentally. When I do want to publish there is just some ad hoc programming or TeX coding to do.

        esttab is a wonderful command I never use. That arises from knowing just enough Stata to be able to write code for most of what I want directly without wondering whether my friend Ben Jann allows it in his command or how to specify it if he does. Ben would, I guess, say the same of some of my commands

        Much more important is what you, the OP, or any interested reader could do for yourself and the summary answer is "More than you think if you are willing to learn some tricks".

        I took your example and wanted to see how far I could get by just writing custom code for a collapse giving me a new dataset. (Some people would want to use frames here in Stata 16 up, which is more than fine by me.)


        Code:
        sysuse auto, clear
        
        local call 
        foreach v in mpg headroom gear_ratio { 
            local call  `call' (count) n`v'=`v' (mean) mean`v'=`v' (sd) SD`v'=`v'  (min) min`v'=`v' (p50) p50`v'=`v' (max) max`v'=`v' 
            local label`v' "`: var label `v''"
            if "`label`v''" == "" local label`v' "`v'"
        }
        
        collapse `call'
        
        gen i = 1
        reshape long n mean SD min p50 max  , j(variable) string i(i)
        drop i
        
        quietly foreach v in mpg headroom gear_ratio { 
            replace variable = "`label`v''" if variable == "`v'"
        }
        
        format mean SD %3.2f
        
        list, noobs 
        
          +----------------------------------------------------------+
          |       variable    n    mean     SD    min     p50    max |
          |----------------------------------------------------------|
          |     Gear Ratio   74    3.01   0.46   2.19   2.955   3.89 |
          | Headroom (in.)   74    2.99   0.85    1.5       3      5 |
          |  Mileage (mpg)   74   21.30   5.79     12      20     41 |
          +----------------------------------------------------------+
        Without trying very hard, a fairly civilised table is at hand. I wanted mean and SD formatted consistently, not so much because of these variables but more because in most of my applications the variables in the table are more nearly similar.

        Much more in similar spirit at https://www.stata-journal.com/articl...article=pr0053

        In this thread for STATA read Stata throughout as explained at https://www.statalist.org/forums/help#spelling

        Comment


        • #5
          What I want is STATA to decide the number of decimals itself in a clever way, based on whether the specific number is an integer or not. In my example, for mpg it would choose zero digits because all numbers are integers, while for gear_ratio it would choose three, because it is not an integer.
          Sorry, I misread your question. To add to Nick's helpful reply, Ben did indeed envision such a scenario and created an automatic format.


          3. Automatic format: You may specify a1, a2, ..., or a9 to cause esttab to choose a reasonable display format for each number depending on the number's value. a may be used as a synonym for a3. The # in a# determines the minimum precision according to the following rules:

          o Absolute numbers smaller than 1 are displayed with # significant decimal places (i.e. with # decimal places ignoring any leading zeros after the decimal point). For example, 0.00123456 is displayed as 0.00123 if the format is a3.

          o Absolute numbers greater than 1 are displayed with as many digits required to retain at least one decimal place and are displayed with a minimum of (# + 1) digits. For example, if the format is a3, 1.23456 is displayed as 1.235, 12.3456 is displayed as 12.35, and 1234.56 is displayed as 1234.6.

          o In any case, integers are displayed with zero decimal places, and very large or very small absolute numbers are displayed in exponential format.

          So, for your example in #1

          Code:
          sysuse auto, clear
          
          local sumvars mpg headroom gear_ratio
          estpost tabstat `sumvars', s(n mean p50 min max sd) col(stat)
          
          esttab . , ///
          c("mean(fmt(a3)) p50(fmt(a3)) sd(fmt(a3)) min(fmt(a3)) max(fmt(a3)) count(fmt(a3))") ///
          label noobs nodepvar nomtitle nonumber replace coeflabels() compress
          Res.:

          Code:
          ----------------------------------------------------------------------------
                                mean       p50        sd       min       max     count
          ----------------------------------------------------------------------------
          Mileage (mpg)        21.30        20     5.786        12        41        74
          Headroom (in.)       2.993         3     0.846     1.500         5        74
          Gear Ratio           3.015     2.955     0.456     2.190     3.890        74
          ----------------------------------------------------------------------------

          Comment


          • #6
            Thanks to Andrew Musau for a fine complementary answer -- and a very gentle tacit correction. Comparing his answer and mine shows that there are more hooks in esttab for controlling format than I implied.

            Comment


            • #7
              Thanks to both of you for your great answers! This is really useful.

              Comment

              Working...
              X