Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loss of precision for large values when using -import excel- and -insheet-

    Hi,

    I have a two-variable one-observation dataset in Excel. Variable 1 is 1.26684846 * (10^-9)and Variable 2 is 266555659858927. Both are, respectively, just arbitrarily small and large numbers I chose to show my point. Further, in Excel, both values are stored using the "Number" format, meaning they are displayed not in scientific notation but in full precision if the right number of decimals is set. If I use -import excel- to import said dataset to Stata, I will get the data both displayed and stored in scientific notation. Although the first variable (the small one) is stored at full precision, the latter does lose some information. I am puzzled by this information loss.

    I have tried to emphasize precision in the command statement, but to my knowledge, -import excel- does not have any option (at least, none shown in the helpfile) that allows the user to change the default storage type, as does -insheet- with option -double-, although this option does not help in the loss of precision for large numbers. Thus, it seems that there is no way to increase precision for large values, although -insheet ...,double- does offer such precision increase for small values (at least until the max precision of the double storage type is met).

    What prevents Stata from importing very large values from Excel at *full* precision? Is there any way to do so when using -insheet- or -import excel-? I haven't been able to find an answer to this problem in earlier posts. I did, however, find a reference to -xmluse-, but it seems that saving Excel data in .xml format is not as straightforward as it sounds. Also, I understand it is rarely useful to demand much precision at very large values, but still is a question worth asking.

    Thanks in advance,
    José

  • #2
    Here's an image of a made-up file:

    Click image for larger version

Name:	test_xlsx.png
Views:	1
Size:	46.8 KB
ID:	213699

    Here's the code to import it:

    Code:
    clear
    set more off
    
    import excel using test.xlsx, firstrow
    
    list
    
    format var1 %40.20f
    format var2 %40.0g
    
    list
    And here is the result:

    Code:
    . list
    
         +-----------------------+
         |      var1        var2 |
         |-----------------------|
      1. | 5.748e-13   2.666e+14 |
         +-----------------------+
    
    . list
    
         +------------------------------------------+
         |                   var1              var2 |
         |------------------------------------------|
      1. | 0.00000000000057483900   266555659858927 |
         +------------------------------------------+
    So basically, I can't reproduce your problem. You need to show us how to reproduce it.

    (I used LibreOffice Calc to produce the MS Excel file and the file is attached.)

    I think there is confusion as to what formats and storage types mean in Stata. The -format- command handles the display (format) of the variable. Note that by changing the format of the variables in my example, you may get the impression that they are stored with different precision, but they're not. Only the display has been changed.

    See -help data_types- and -help format- for details.
    Attached Files
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      A reference is:

      Stata tip 113: Changing a variable's format: What it does and does not mean
      N. J. Cox. 2012
      Stata Journal Volume 12 Number 4
      You should:

      1. Read the FAQ carefully.

      2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

      3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

      4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

      Comment

      Working...
      X