Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Export to CSV data with high values with dot as decimal separator and no rounding

    Good afternoon everyone,

    I would like to ask: what is the best way to export a dataset to CSV in Stata so that numeric values are saved in the format (e.g., 320453000,30), using a comma as the decimal separator (not a dot I made a mistake in the title), and ensuring that values are not rounded during export?

    I'm looking for the most reliable syntax or method to preserve this formatting during the export process.

    Moreover, is it possible—when exporting to CSV—to specify which variables are strings and which are numeric? I need to export a dataset containing both types, and I’d like to avoid formatting issues (e.g., quotation marks in strings, incorrect decimal handling, etc.).

    I'm familiar with several options used during import—such as encoding("utf-8"), binquotes(strict), parselocale(it_IT), or decimalseparator(,)—but these don't seem to be supported during export.

    Any help or best practices would be greatly appreciated.

    Best regards,
    Last edited by Chiara Tasselli; 22 Jul 2025, 09:03.

  • #2
    I'm not sure I fully understand your concerns.

    You say you
    Originally posted by Chiara Tasselli View Post
    [...] need to export a dataset containing both types [numeric and string], and I’d like to avoid formatting issues (e.g., quotation marks in strings, incorrect decimal handling, etc.).
    and mention
    Originally posted by Chiara Tasselli View Post
    [...] ensuring that values are not rounded during export
    Could you provide an example of such a "formatting issue" or "rounding" where (you believe) Stata exports your data incorrectly?

    Also, why do you insist on a comma as the decimal separator in a comma-separated values file? Even if that's possible, it seems like a bad idea.

    Generally, it seems better to adjust the importing settings in the target software rather than messing with Stata's export routines.

    Comment


    • #3
      My read of this is that there are three separate issues. First, you want to ensure that variables are not rounded. Second, you want to export numeric values using commas as the decimal point separator. Third, you want to avoid "formatting issues."

      Re: rounding issues. Independently of Daniel's reply in #2, Bill Gould's excellent blog post on precision covers most issues (see link, https://blog.stata.com/2012/04/02/th...-to-precision/). As the blog post indicates, one way to avoid rounding issues is to multiply variables such that they become integers, then precision issues are avoided for most use cases -- see sections 4 and 5 of Bill's blog post. Once all values are converted to integers there will not be any concerns regarding the use of periods or commas to denote the decimal place separator.

      Re: writing a comma-delimited file using commas as a decimal separator. I agree with Daniel in #2 that this is a bad idea. To be more specific, how will you tell any other program interpreting the output whether a comma in the data is separating two columns (e.g. "Variable #1" and "Variable #2") or is separating the integer and non-integer parts of a numeric value (e.g. 100,01 denoting 100 dollars and 1 cent, for example)?

      This all being said. Getting Stata to write a file with export delimited using commas as a separator seems to be difficult. In my version of Stata 17, Stata seemed to ignore the comma decimal point separator even when using the datafmt option in export delimited. If you have to do this, one option would be to:
      1. Convert all variables to string (e.g. tostring *, replace).
        1. Generally speaking, this is a bad idea! You will probably lose precision when doing this!
      2. Substitute periods for commas for the relevant variables using the replace command and the subinstr function
      3. Export the data using the export delimited command.
        1. My prior would be that nearly every value will be enclosed in double quotation marks ("), since the formerly-numeric variables will now be treated as string variables and will contain the same character in their value that also serves to separate variables -- the comma!
      Regarding formatting issues. I am not sure precisely what you mean by this. But some examples would be ideal. if I have to write files to external sources, I like to use uncommon characters to delimit fields. For example the "|" character (on US keyboards this is the character produced by holding down the Shift key and the backslash "" key) is often a good choice.



      Comment


      • #4
        I believe you can avoid "rounding" by setting the display format of your your variables before exporting, and you can change the delimiter to something other than comma (tab, pipe, etc.) My experience is that most (?) other software allows one to define a delimiter other than "," , so eventual import should work ok.

        Code:
        sysuse auto, clear
        keep make headroom gear_ratio
        format headroom gear_ratio %15.8c // as many decimal digits as you want
        set dp comma
        list in 1/3
             +---------------------------------------+
             | make            headroom   gear_ratio |
             |---------------------------------------|
          1. | AMC Concord   2,50000000   3,57999992 |
          2. | AMC Pacer     3,00000000   2,52999997 |
          3. | AMC Spirit    3,00000000   3,07999992 |
             +---------------------------------------+
        
        export delimited using c:/temp/myfile.csv, nolabel datafmt quote delim(|) replace
        // admire the result
        type c:/temp/myfile.csv, lines(3)
        make|headroom|gear_ratio
        "AMC Concord"|2.50000000|3.57999992
        "AMC Pacer"|3.00000000|2.52999997
        Note, however, that contrary to how I would interpret the documentation for the -datafmt- option "use the variables' display format upon export," setting the comma as decimal separator is not respected in the export, even with "|" as the delimiter


        Comment


        • #5
          Originally posted by Mike Lacy View Post
          you can change the delimiter to something other than comma (tab, pipe, etc.)
          As a person who is from a country that uses the comma as the decimal "point": What I see that is most commonly used (by e.g. national statistics bureaus) is the semi-colon (. I guess the reason is that it is sufficiently obscure that it is not used somewhere else, and at the same time common enough that most end users will not be scared by it and know where to find it on their key board. A tab would lead to a lot of confusion among many end users as they don't see it.

          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thank you all for the helpful suggestions – I’ve learned a lot from your contributions and I’m currently studying the blog post recommended by Michael Gropper.

            Unfortunately, I cannot decide the format myself: I’ve been assigned the task of building open data that must follow a specific structure – using semicolon ( ; ) as the field separator and comma (,) as the decimal mark, since the format must comply with Italian conventions (the open data is intended for Italy).

            That said, I’ll definitely keep in mind the limitations regarding the use of commas in CSV files and will try to raise this issue where possible.

            Also, if I may add a general thought: Stata seems rather inflexible when it comes to exporting, especially regarding decimal handling and export formatting. I’m not sure if it's just me...

            Thanks again!

            Comment


            • #7
              There is no limitation regarding commas in CSV files, and there is no issue for you to raise: Using semicolons as a separator and commas for the decimal mark is perfectly standard and normal (at least in (continental) Europe).

              The fact that export delim does not respect the decimal mark you set seems like either a bug or an oversight to me. To get that fixed (if it really is a bug or oversight) you can contact Stata's techsupport:
              https://www.stata.com/support/tech-support/contact/

              (I have taken a quick look at the code of export and the part that actually writes the file is built-in. This means that we users don't have access to it. So this something only StataCorp can change)
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Originally posted by Maarten Buis View Post
                The fact that export delim does not respect the decimal mark you set seems like either a bug or an oversight to me.
                Obviously, I have to leave the authoritative answer to StataCorp, but this could well be intended behavior. Allowing the comma as a decimal separator would require ensuring that the field delimiter is not also a comma. That, in turn, would mean the default delimiter might have to change (or the appropriate option be required) depending on the chosen decimal separator. This seems cumbersome both to program and to document.

                On a somewhat different note, I'm not even sure StataCorp maintains the code that actually writes the files. I believe many of their I/O routines rely heavily on third-party libraries.

                Edit: I have to take some of my comments back. Stata happily accepts the period as a field delimiter even when the decimal separator is also a period. Unsurprisingly, importing such data results in a mess no matter what you try:
                Code:
                . sysuse auto
                (1978 automobile data)
                
                . tempfile tmp
                
                . export delimited headroom gear_ratio mpg using "`tmp'" in 1/5 , delimiter(".")
                file C:\Users\klein\AppData\Local\Temp\ST_4b0c_000001.tmp saved
                
                . type "`tmp'"
                headroom.gear_ratio.mpg
                2.5.3.5799999.22
                3.2.53.17
                3.3.0799999.22
                4.5.2.9300001.20
                4.2.4100001.15
                
                . import delimited "`tmp'" ,  varnames(1) clear
                (encoding automatically selected: ISO-8859-9)
                (1 var, 5 obs)
                
                . list
                
                     +------------------+
                     | headroomgear_r~g |
                     |------------------|
                  1. | 2.5.3.5799999.22 |
                  2. |        3.2.53.17 |
                  3. |   3.3.0799999.22 |
                  4. | 4.5.2.9300001.20 |
                  5. |   4.2.4100001.15 |
                     +------------------+
                
                . import delimited "`tmp'" , delimiter(".") varnames(1) clear
                (encoding automatically selected: ISO-8859-9)
                (5 vars, 5 obs)
                
                . list 
                
                     +----------------------------------------------+
                     | headroom   gear_r~o       mpg        v4   v5 |
                     |----------------------------------------------|
                  1. |        2          5         3   5799999   22 |
                  2. |        3          2        53        17    . |
                  3. |        3          3    799999        22    . |
                  4. |        4          5         2   9300001   20 |
                  5. |        4          2   4100001        15    . |
                     +----------------------------------------------+
                Last edited by daniel klein; 24 Jul 2025, 12:47.

                Comment


                • #9
                  Minority Report: why anyone ever uses the comma as a separator is a mystery to me.

                  Seriously, most data sets contain a mixture of text and numeric data. The notion that numbers will not contain commas as a decimal point is a matter of American exceptionalism and should not be baked into data sets that, in the modern world, are likely to be used internationally. And text entries are very likely to contain commas no matter where the data comes from. As has already been pointed out several times in this thread, commas in the data and commas as the separator is a recipe for an import that is fubar. Why is this dangerous practice the norm? I would guess that there is some historical reason for it, but I've been engaged in data work since the early 1960's and I don't know how this has come to pass.

                  While semicolons are less common in text than commas, they are hardly a rarity either. I don't understand why the pipe (|) character has not become the default (in general, not just in Stata) separator in delimited text files.

                  Comment


                  • #10
                    Originally posted by Chiara Tasselli View Post
                    Also, if I may add a general thought: Stata seems rather inflexible when it comes to exporting, especially regarding decimal handling and export formatting. I’m not sure if it's just me...
                    Frankly, I think that’s a bit of an exaggeration. I'm still not sure what you mean by "decimal handling", but as far as I can tell, the only limitation is that Stata doesn’t support the comma as a decimal separator. Other than that, in what way does the datafmt option not do what you need?

                    Anyway, I think we've made our points. If you want this badly, convert your numeric variables to strings using whichever format (including decimal separator) you prefer, and feed it to export delimited. Here's a minimal example:
                    Code:
                    sysuse auto
                    keep make headroom gear_ratio
                    preserve // <- we're about to destroy the data
                    tostring headroom gear_ratio , replace force format("%18,2f")
                    tempfile csv
                    export delimited make headroom gear_ratio using "`csv'" in 1/5 , delimiter(";")
                    type "`csv'"
                    which yields
                    Code:
                    . type "`csv'"
                    make;headroom;gear_ratio
                    AMC Concord;2,50;3,58
                    AMC Pacer;3,00;2,53
                    AMC Spirit;3,00;3,08
                    Buick Century;4,50;2,93
                    Buick Electra;4,00;2,41

                    Comment


                    • #11
                      If we're willing to assume that any instance of two digits surrounding a period (i.e., 123.456) represents a number with a decimal point, another approach is:

                      1) Export to a temp file using the "." decimal point, but adopt some delimiter other than the comma.
                      2) Read this temp file into Stata as a strL and replace all apparent decimal periods with a comma
                      3) Write this file out to the desired csv file.

                      2) and 3) could be done in a text processing program, but we can do it all in Stata.

                      Code:
                      sysuse auto, clear
                      keep make headroom gear_ratio
                      format headroom gear_ratio %15.8f // as many decimal digits as you want
                      //
                      // Export to a temp file, using something other than the comma
                      // as the delimiter and the default "." as the decimal point
                      tempfile temp
                      export delimited using "`temp'", nolabel datafmt quote delim(|) replace
                      type "`temp'", lines(3)
                      //
                      // Use -fileread- to allow us to process this temp file in Stata as 
                      // just a string of bytes. Within that file, replace all instances 
                      // of a period "." between two digits with a comma.
                      clear
                      set obs 1 
                      gen strL s = fileread("`temp'")
                      // Replace "." with "," and inspect the result.
                      replace s = ustrregexra(s,"([0-9])\.([0-9])", "$1,$2" )
                      list s 
                      //
                      // Save this to the desired "csv" [sic] file
                      gen b = filewrite("Your File.csv", s)

                      Comment


                      • #12
                        Thank you all for your contributions and perspectives — I've learned a lot!

                        Comment

                        Working...
                        X