Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Format float variable to long

    Hi y'all,

    I'm working with a variable that is a numerical but is formatted as a float. The numbers are very large, up to the hundreds of millions, but when I use the univar or sum command, i'm losing precision on my 5 number summary. For example:

    part of univar output:


    Mean S.D. Min 25 Mdn .75 Max
    ----------------------------------------------------------------
    3.6e+07 6.4e+07 548.48 3.5e+06 1.0e+07 3.5e+07 4.0e+08


    part of sum output:
    50% 1.00e+07 Mean 3.60e+07


    I'm able to get precise means and SDs when I use the tab command (for the above example, mean=35985642 & SD=63948306), so right now the issue is getting precise medians. The best idea I could come up with is reformatting the variable from float to long. It's currently:

    storage display
    type format
    --------------------
    float %9.0f


    But I'm having difficulty figuring out how to do that. I'm open to any suggestions, even outside of reformatting. I'm using Stata Version 12.1.

    Thanks for your help!

  • #2
    A change of storage type (which is not at all the same as a change in (display) format) is possible with recast if and only if you wouldn't lose information:


    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . d
    
    Contains data from C:\Program Files (x86)\Stata15\ado\base/a/auto.dta
      obs:            74                          1978 Automobile Data
     vars:            12                          13 Apr 2016 17:45
     size:         3,182                          (_dta has notes)
    ---------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    ---------------------------------------------------------------------------------
    make            str18   %-18s                 Make and Model
    price           int     %8.0gc                Price
    mpg             int     %8.0g                 Mileage (mpg)
    rep78           int     %8.0g                 Repair Record 1978
    headroom        float   %6.1f                 Headroom (in.)
    trunk           int     %8.0g                 Trunk space (cu. ft.)
    weight          int     %8.0gc                Weight (lbs.)
    length          int     %8.0g                 Length (in.)
    turn            int     %8.0g                 Turn Circle (ft.)
    displacement    int     %8.0g                 Displacement (cu. in.)
    gear_ratio      float   %6.2f                 Gear Ratio
    foreign         byte    %8.0g      origin     Car type
    ---------------------------------------------------------------------------------
    Sorted by: foreign
    
    . recast long mpg
    
    . recast long headroom
    headroom:  37 values would be changed; not changed
    But your problem is with univar (which is from the Stata Technical Bulletin, as you are asked to explain: FAQ Advice #12). By the way, its author long since departed the Stata community, circa 2000, which was a great pity for us, but the email addresses below should be considered defunct.


    sg67_1 from http://www.stata.com/stb/stb51
    STB-51 sg67_1. Update to univar. / STB insert by John R. Gleason,
    Syracuse University / Support: [email protected] / After
    installation, see help univar

    sg67 from http://www.stata.com/stb/stb36
    STB36 sg67. Univariate summaries with boxplots. / STB insert by John R.
    Gleason, Syracuse University. / Support: [email protected] /
    After installation, see help univar.

    A quick look at the code suggests that you would need to rewrite a chunk of univar to get it to do what you want. There's not enough space to print out results as you wish and you'd need to rewrite that display.

    On the other hand, consider this:


    Code:
    . clear
    
    . set obs 2
    number of observations (_N) was 0, now 2
    
    . gen long foo = cond(_n == 1, 1e7, 9e8)
    
    . tabstat foo, s(mean sd min p25 p50 p75 max)
    
        variable |      mean        sd       min       p25       p50       p75
    -------------+------------------------------------------------------------
             foo |  4.55e+08  6.29e+08  1.00e+07  1.00e+07  4.55e+08  9.00e+08
    --------------------------------------------------------------------------
    
        variable |       max
    -------------+----------
             foo |  9.00e+08
    ------------------------
    
    . tabstat foo, s(mean sd min p25 p50 p75 max) format(%9.0f)
    
        variable |      mean        sd       min       p25       p50       p75
    -------------+------------------------------------------------------------
             foo | 455000000 629325035  10000000  10000000 455000000 900000000
    --------------------------------------------------------------------------
    
        variable |       max
    -------------+----------
             foo | 900000000
    ------------------------

    Comment


    • #3
      Hi Nick,

      Wow thanks for the informative post. I will perhaps one day take a whack at rewriting the univar code, but will absolutely include, next time, more detail on any installed commands I'm using. Your tabstat suggestion did the trick, especially as I was allowed to use the by(varname) option....


      Summary for variables: dollar
      by categories of: YR

      YR | mean sd min p25 p50 p75 max
      ---------+----------------------------------------------------------------------
      2012 | 25791771 52084887 2266 2721950 7230274 20500000 400000000
      2018 | 35985642 63948306 548 3500000 10000000 35100100 400000000



      For the recast command... The difference between the two examples you gave are that with recast long mpg information is NOT lost, but with recast long headroom information WOULD BE lost if I use , force (because 37 values would be changed)?

      Thank you again!
      Laura

      Comment


      • #4
        That is right. recast long mpg is obeyed without error and so no news is good news. But headroom contains fractional parts and can't be recast long. The point of using the auto data is that you can run the code examples and check out details yourself.

        Comment


        • #5
          Perfect. Thanks so much for your help and guidance!

          Comment

          Working...
          X