Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • exporting to excel vs outsheet to csv - different effects on data.

    When I export data to excel, excel appears to add additional decimals, whereas exporting to a csv preserves the data. Converting the csv to an excel file also preserves the data.

    For example, one observation in stata (and in the raw data source) has the value: 310.744. When this observation is exported to excel it becomes: 310.743988037109.

    If I outsheet to a csv it stays as 310.744 and if I save that csv as an excel file it still remains at 310.744.

    Is there anyway I can get excel not to change my data?

    Thanks



  • #2
    Is there anyway I can get excel not to change my data?
    Well, Excel isn't actually changing your data; exporting to csv is.

    What's going on this. -export excel- sends to Excel the actual data that is contained in your Stata data set: display formats are ignored. What you like seeing as 310.744, and can see in Stata's output lists and in -browse- if you specify format %4.3f, is actually internally represented as something else, because 310.744 has no exact representation in binary. In your instance, it seems to be represented as 310.743988037109. With -export excel- that number gets sent to Excel. If you want to only see it rounded to 3 decimal places in Excel, you can use Excel's formatting commands to do that. But you cannot control that from within Stata.

    By contrast, when you export to a .csv using -export delimited- (or the older -outsheet-), your Stata display formats are applied before exporting and the resulting .csv file (which is, after all, a text file) contains character representations that correspond to that number of decimal places. When you then read that into Excel, Excel holds the data that way.

    I do have to say that it would be nice if it were possible to first "prepare" the target Excel spreadsheet by specifying the numeric formats you want and then having an option in -export excel- to leave the display formats unchanged. But there is no such option, I'm afraid. -putexcel- gives you a lot of control over the formatting of the results in Excel, but it is really intended for exporting command output, not whole data sets, and would be very cumbersom to adapt to the latter function.

    Comment


    • #3
      Thanks Clyde - that's really interesting. So is this behavior normal for all statistical software (i.e. SAS, R etc)?

      Comment


      • #4
        I haven't used SAS since the 1980's, and I've never used R, so perhaps others who are familiar with these packages can comment.

        Comment


        • #5
          I would say that the underlying principle - the way a number appears when displayed in a typical base-10 format does not necessarily reflect the value the data has in storage - is common to all statistical packages. As Clyde said, 310.744 has no exact representation in binary.

          Comment


          • #6
            No, Richard,not all packages behave like that. SPSS saves the value 310.744 to Excel as 310.744.
            Click image for larger version

Name:	310-744.png
Views:	1
Size:	5.3 KB
ID:	1356394


            (value entered from keyboard exactly as it appears on the screen.)

            Best, Sergiy

            Comment


            • #7
              The result Sergly demonstrates with SPSS can be achieved with Stata as well, by creating and storing the data using double precision. The code below demonstrates the difference between the results of calculations using storage type double and storage type float. Note that in neither case is the value 310.744 stored precisely in Stata. The exported file can be opened in Excel, in which case both ratios display by default as 310.744 (Excel for Mac 2011). Changing the Excel format from general to fixed with 7 decimal places will show the lesser precision of the number exported from float relative to that exported from double. However, increasing the number of decimal places for the number exported from double does not lead to a display of the same imprecision apparent in Stata. My recollection from work in a previous life is that Excel takes steps to mask this sort of imprecision in its display.

              From this I conclude that SPSS created the number in double precision. This doesn't contradict anything Clyde, Richard, or I wrote; it does present a way of reducing the visual effects of the imprecise storage of the exported data.
              Code:
              . clear
              
              . set obs 1
              number of observations (_N) was 0, now 1
              
              . generate double d_n = 310744
              
              . generate double d_d = 1000
              
              . generate double d_x = d_n/d_d
              
              . generate float  f_n = 310744
              
              . generate float  f_d = 1000
              
              . generate float  f_x = f_n/f_d
              
              . list, clean noobs
              
                     d_n    d_d       d_x      f_n    f_d       f_x  
                  310744   1000   310.744   310744   1000   310.744  
              
              . export excel using precision.xlsx, replace firstrow(variables)
              file precision.xlsx saved
              
              . format * %27.0g
              
              . list, clean noobs
              
                     d_n    d_d                    d_x      f_n    f_d                   f_x  
                  310744   1000   310.7440000000000282   310744   1000   310.743988037109375  
              
              . format * %21x
              
              . list *_x, clean noobs
              
                                    d_x                     f_x  
                  +1.36be76c8b4396X+008   +1.36be760000000X+008  
              
              .
              Here's the code if you want to try this at home.
              Code:
              clear
              set obs 1
              generate double d_n = 310744
              generate double d_d = 1000
              generate double d_x = d_n/d_d
              generate float  f_n = 310744
              generate float  f_d = 1000
              generate float  f_x = f_n/f_d
              list, clean noobs
              export excel using precision.xlsx, replace firstrow(variables)
              format * %27.0g
              list, clean noobs
              format * %21x
              list *_x, clean noobs
              Last edited by William Lisowski; 13 Sep 2016, 06:25.

              Comment

              Working...
              X