Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • export delimited csv format

    I am exporting data into a csv file with the command "export delimited variables using test.csv"
    For some reason some of the variables are appearing in the csv file with 14 decimal places.

    An example is var1 which has a value (when I list it) of 529.2 in the 9th observation

    I have rounded it to 1 decimal place using the round function. However in the .csv file it appears as 529.2000000000001
    This doesn't happen again with this variable until the 16th and the the 33rd and some subsequent observations but it does occur with other variables. About every 3rd or 4th observation has at least one variable that is written in the .csv file with 13 '0's and a '1' trailing the 'true' value even though none display that way in stata or in the data window. The pattern seems random.

    This doesn't effect how the data are read into the subsequent program but I would like to know how I can control the formats in the export file. All the variables are stored as double and displayed as %12.0g or %9.0f
    Martin


  • #2
    Try storing them as int before exporting?

    Comment


    • #3
      However they are presented, all numbers are held within Stata as binary numbers; so decimals can only be held exactly if they have binary equivalents that can be held exactly within the storage type being used. That's a main message of numerous discussions of precision, for which

      Code:
      search precision
      provides leading examples. It follows that round(whatever,0.1) will work as you want if and only if the numbers concerned round to integers or half-integers. Here is a demonstration:

      Code:
      . clear 
      
      . set obs 100
      obs was 0, now 100
      
      . set seed 2803
      
      . gen y = round(runiform(), 0.1)
      
      . clonevar Y = y
      
      . format y %21x
      
      . groups y Y
      
        +----------------------------------------------+
        |                     y    Y   Freq.   Percent |
        |----------------------------------------------|
        | +0.0000000000000X-3ff    0       6      6.00 |
        | +1.99999a0000000X-004   .1      13     13.00 |
        | +1.99999a0000000X-003   .2       9      9.00 |
        | +1.3333340000000X-002   .3       9      9.00 |
        | +1.99999a0000000X-002   .4      13     13.00 |
        |----------------------------------------------|
        | +1.0000000000000X-001   .5      11     11.00 |
        | +1.3333340000000X-001   .6       8      8.00 |
        | +1.6666660000000X-001   .7       6      6.00 |
        | +1.99999a0000000X-001   .8       7      7.00 |
        | +1.cccccc0000000X-001   .9      15     15.00 |
        |----------------------------------------------|
        | +1.0000000000000X+000    1       3      3.00 |
        +----------------------------------------------+
      Converting to strings using a %2.1f display format is another possibility.

      Comment


      • #4
        While Nick is, as ever, right and crystal clear about the precision issue underlying Martin's problem, it doesn't get to the question Martin has raised, which is how to control the formatting of the results exported to the .csv file. -export delimited- does not have any options that allow the user to force it to honor the display formats in the data set. The only way I know to do this is to convert the variables to strings (which, in a situation like this, will require the -force- option of -tostring-) and do the export that way. See the following examples:

        Code:
        . clear*
        . set obs 3
        obs was 0, now 3
        . gen x = _n/3
        .
        . list
             +----------+
             |        x |
             |----------|
          1. | .3333333 |
          2. | .6666667 |
          3. |        1 |
             +----------+
        
        .
        . export delimited using unformatted.csv, replace
        file unformatted.csv saved
        .
        . type unformatted.csv, showtabs
        x
        .33333334
        .66666669
        1
        .
        . format x %3.2f
        .
        . export delimited using formatted.csv, replace
        file formatted.csv saved
        .
        . type formatted.csv, showtabs
        x
        .33333334
        .66666669
        1
        .
        . tostring x, replace usedisplayformat force
        x was float now str4
        x was forced to string; some loss of information
        .
        . export delimited using as_strings.csv, replace
        file as_strings.csv saved
        .
        . type as_strings.csv, showtabs
        x
        0.33
        0.67
        1.00
        Actually, a -format- option on -export delimited- and also on -export excel- is part of my (as yet unposted) wish list for Stata 14.

        Comment


        • #5
          That's correct. As usual, changing a display format changes only how something is displayed and has precisely zero effect on the value stored.

          Comment

          Working...
          X