Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rounding off long decimals after importing from excel

    Dear Statalist forum,

    I am encountering an issue that seems to be trivial, but so far I have not found a solution and several posts in this forum could not help:

    I imported data from excel that contains cells with very long decimals, for instance 19.464398743793332. When I first destring the variables in STATA, they are automatically stored in the scientific format (e.g. 1.94e-16).
    So far, I have simply not found out how I could round these numbers off, or just display them in a rounded format. Neither something like
    Code:
    gen value1_new = round(value1, 0.01)
    , nor
    Code:
    gen value1_new2 = value 1
    format value1_new2 %16.2f
    or something similar did work. I was able to limit the displayed decimals, but got still displayed the "e16" afterwards, as here in the example:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double value1 str17 value2 float(value1_new value1_new2)
                   . "19.9958939998468"             .             .
          1994506003 "2106108788.4"        1994505984    1994505984
          1994109003 "2105711788.4"        1994109056    1994109056
     199364246972543 "2202037379.66668" 1.9936424e+14 1.9936424e+14
                   . "12403617031.1715"             .             .
     329069794229935 "38.8560613126543"  3.290698e+14  3.290698e+14
         22937260873 "2874058234.3"       22937260032   22937260032
         19563967873 "2487128423.3"       19563968512   19563968512
     375534994458024 "3438619106.27153"   3.75535e+14   3.75535e+14
     129361584607353 "11851853961.071"   1.293616e+14  1.293616e+14
     367571998616128 "5.4559588978817"    3.67572e+14   3.67572e+14
          3108453055 "2883679090.4"        3108453120    3108453120
     9.247447612e+15 "955602845.3"       9.247448e+15  9.247448e+15
    8665486245366631 "52860813856.1447"  8.665486e+15  8.665486e+15
     169777451612066 "135893483715.184" 1.6977746e+14 1.6977746e+14
    end
    Is this a common problem and I am missing an important step, or is there no simple solution in STATA (and I shoud rather try to round off in excel)?

    Many thanks in advance!
    Last edited by Roman von Vorgestern; 23 Sep 2021, 13:31.

  • #2
    I have more questions than answers.

    There is no context here to guide anyone, just cryptic variable names like value1 and value2.

    ("In mathematics context obscures structure.... In data analysis, context provides meaning." (George W. Cobb)) http://www.amazon.com/Introduction-D.../dp/047041216X

    Why are the data read in as string in the first place: was that automatic by Stata or done by you on purpose?

    When I destring value2, replace the result is a double; I can't see any point in forcing that into a float.

    What kind of variable ranges from about 5 to about 136 billion? Do you trust those values? Almost nothing outside certain kinds of experimental physics justifies that many decimal places.

    Comment


    • #3
      Thanks for your answer.

      Sorry, to provide some context:

      The excel is automatically read in as string by STATA (after
      Code:
      import excel test, clear firstrow
      . In the excel, the different variables are denoted in rows, and the years in columns (e.g. Value1 is 2010), which explains the different range.

      And I am sorry, but I guess the issue was a slip from my side. Here, I am only displaying a subset of data, but in the original data I also had ".." for missings in the string values. Therefore, I was running
      Code:
      destring Year_2009, ignore("..") gen(Year_2009_2)
      . But then, this command also ignores the "." decimal points in my decimal variables, correct? So, I guess in this case, the correct way to destring the variables would be to do
      Code:
      destring Year_2009, gen(Year_2009_3) force
      ?

      Here is the corresponding data example:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str72 Variables str17 Year_2009 double(Year_2009_2 Year_2009_3)
      "var1" "19.9958939998468" 199958939998468 19.9958939998468
      "var2" "2106108788.4"         21061087884     2106108788.4
      "var3" "2105711788.4"         21057117884     2105711788.4
      "var4" "2202037379.66668" 220203737966668 2202037379.66668
      "var5" "12403617031.1715" 124036170311715 12403617031.1715
      "var1" "38.8560613126543" 388560613126543 38.8560613126543
      "var2" "2874058234.3"         28740582343     2874058234.3
      "var3" "2487128423.3"         24871284233     2487128423.3
      "var4" "3438619106.27153" 343861910627153 3438619106.27153
      "var5" "11851853961.071"   11851853961071  11851853961.071
      "var1" "5.4559588978817"   54559588978817  5.4559588978817
      "var2" "2883679090.4"         28836790904     2883679090.4
      "var3" "955602845.3"           9556028453      955602845.3
      "var4" "52860813856.1447" 528608138561447 52860813856.1447
      "var5" "135893483715.184" 135893483715184 135893483715.184
      end
      Best regards.

      Comment


      • #4
        That makes more sense, thanks. But the force option is absolutely the last resort. The real challenge is to find out why some of the variables with numeric characters have been read in as string. You need something like

        Code:
        ds Year*, has(type string) 
        
        foreach v in `r(varlist)' { 
        
             list `v' if missing(real(`v')) 
        
        }
        which will show any observations that can't be converted to numeric.

        For more on the logic see https://www.statalist.org/forums/for...iable-problems

        Comment


        • #5
          Many thanks for the tips, I will try this!

          Comment

          Working...
          X