Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues importing (or) converting numeric variables from CSV in Stata

    Hello everyone,

    I’m encountering a problem when importing numeric variables (and/or converting from strings) into Stata, and I can’t figure out what causes it.

    The data values I’m working with have this format:
    205000000.00

    Code:
    I have tried several import syntaxes, for example:
    import delimited "NAMEFILE.csv", bindquote(strict) delimiter(";") varnames(1) encoding("utf-8") clear
    
    or 
    
    import delimited "NAMEFILE.csv", bindquote(strict) stringcols(_all) delimiter(";") varnames(1) encoding("utf-8") clear
    destring varibles list, replace
    When I import everithing as a string the values are correct, but....

    The issue is that when I import as numeric or convert from string to numeric, strange things happen:
    • Sometimes decimal places are lost (and ther is write . 00 )
    • Sometimes decimals remain intact
    • Sometimes numbers get rounded automatically
    • But the strangest part is that sometimes the last digits of the non-decimal part of the number change unpredictably, producing results that are close but not exactly equal to the true value, with discrepancies usually smaller than around 100.
    How can I import large numbers with decimals without altering their values?

  • #2
    Please provide a minimal example of the imported string variable(s), and the exact code that you use to convert to numeric.

    Comment


    • #3
      Originally posted by Chiara Tasselli View Post
      . . . when I import as numeric or convert from string to numeric, strange things happen:
      Maybe try adding the asdouble option to the command in order to override the single-precision floating point default.
      Code:
      display %12.2f 205000001.11
      display %12.2f float(205000001.11)

      Comment


      • #4
        Originally posted by daniel klein View Post
        Please provide a minimal example of the imported string variable(s), and the exact code that you use to convert to numeric.

        Here the destring code:

        Code:
            
            destring   variables list, replace float
                    
            format %20,2gc   variables list
        Many thanks in advance for your time-

        below a small example with some variables:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str13(finanziamentoprivato finanziamentopnrr finanziamentototalepubblico)
        "15004263.00"   "135031969.00"  "135031969.00" 
        "1947023.95"    "212401131.40"  "279598955.05" 
        "136519376.00"  "194772949.00"  "194772949.00" 
        "98610458.00"   "230077817.00"  "230077817.00" 
        "17138759.00"   "105718284.00"  "105718284.00" 
        "0.00"          "398976182.34"  "498251029.62" 
        "0.00"          "251732410.00"  "385569850.00" 
        "13863912.00"   "124769169.00"  "124769169.00" 
        "135577579.00"  "222060483.00"  "222060483.00" 
        "0.00"          "314164346.00"  "314164346.00" 
        "0.00"          "155208808.81"  "155208808.81" 
        "121865151.00"  "255510293.00"  "255510293.00" 
        "14729134.00"   "132562088.00"  "132562088.00" 
        "0.00"          "74081720.00"   "134398984.00" 
        "152686558.00"  "356268499.00"  "356268499.00" 
        "316228898.00"  "290724367.00"  "290724367.00" 
        "0.00"          "205000000.00"  "705094967.91" 
        "113779676.00"  "242695375.00"  "242695375.00" 
        "149226695.00"  "193140035.00"  "193140035.00" 
        "1223256779.45" "289693256.56"  "686840000.00" 
        "1778190000.00" "226810000.00"  "226810000.00" 
        "0.00"          "317071777.54"  "470486133.05" 
        "0.00"          "73539182.00"   "110289946.00" 
        "0.00"          "74168392.00"   "301491500.00" 
        "0.00"          "130800000.00"  "153807525.87" 
        "0.00"          "30685000.00"   "297635728.92" 
        "0.00"          "130000000.00"  "138791431.83" 
        "0.00"          "32807715.61"   "318750000.00" 
        "0.00"          "488676472.00"  "757225416.00" 
        "0.00"          "9652.43"       "148498875.00" 
        "0.00"          "362405637.00"  "362405637.00" 
        "0.00"          "347464169.00"  "347464169.00" 
        "0.00"          "222142224.26"  "272197353.92" 
        "0.00"          "398822272.70"  "511324369.00" 
        "0.00"          "213495400.00"  "213495400.00" 
        "0.00"          "99607907.24"   "119671581.45" 
        "0.00"          "383942297.00"  "383942297.00" 
        "0.00"          "84500000.00"   "180994730.00" 
        "0.00"          "192091370.00"  "192091370.00" 
        "0.00"          "383175903.00"  "542060775.00" 
        "0.00"          "2073027699.44" "2958500635.56"
        "0.00"          "3615351664.07" "8772907326.07"
        "0.00"          "167889914.28"  "242089914.28" 
        "0.00"          "181967155.30"  "192160985.00" 
        "0.00"          "93000000.00"   "234971770.13" 
        "0.00"          "251243883.00"  "251243883.00" 
        "68036421.49"   "150000000.00"  "389643911.65" 
        "0.00"          "308501539.82"  "485831255.91" 
        "8301257.51"    "500000000.00"  "500000000.00" 
        "0.00"          "41311187.00"   "73848395.00"  
        "0.00"          "169844330.00"  "169844330.00" 
        "0.00"          "798781720.94"  "2523649886.27"
        "0.00"          "62606776.22"   "140487690.00" 
        "0.00"          "110000000.00"  "179400010.00" 
        "0.00"          "374661035.86"  "484036841.89" 
        "0.00"          "128018256.12"  "143798305.68" 
        "0.00"          "128989806.05"  "438189200.00" 
        "0.00"          "2396972300.56" "3059076950.42"
        "0.00"          "642159014.87"  "1137105493.76"
        "0.00"          "35560000.00"   "149052089.68" 
        "0.00"          "176000000.00"  "350102325.44" 
        "0.00"          "195127300.07"  "317459700.00" 
        "0.00"          "213254965.65"  "213497010.11" 
        "0.00"          "432733668.53"  "437246771.93" 
        "0.00"          "24592583.18"   "145226632.25" 
        "0.00"          "184692167.00"  "354447895.00" 
        "0.00"          "124752349.10"  "276632615.59" 
        "0.00"          "73564648.17"   "275561651.96" 
        "0.00"          "1800000000.00" "2755000003.47"
        "0.00"          "215664629.26"  "215664629.26" 
        "0.00"          "612181982.76"  "946211720.75" 
        "0.00"          "100000000.00"  "194256600.00" 
        "0.00"          "2268050053.73" "2282561452.00"
        "0.00"          "291384918.83"  "358482318.52" 
        "0.00"          "126724905.10"  "220339164.38" 
        end
        -----------




        Comment


        • #5
          Originally posted by Joseph Coveney View Post
          Maybe try adding the asdouble option to the command in order to override the single-precision floating point default.
          Code:
          display %12.2f 205000001.11
          display %12.2f float(205000001.11)
          Thank you very much for the suggestion! Do you mean in the import or in the destring? Because it doesn't seem like destring supports that option.

          Comment


          • #6
            Try
            Code:
            format varname %20,2fc
            and see whether that works for you. If it doesn't point out an example where things (apparently) go wrong.
            Last edited by daniel klein; 08 Jul 2025, 03:47.

            Comment


            • #7
              Things get confused now; sorry.

              There are (at least) two potential pitfalls here.

              Joseph Coveney addresses potential problems stemming from precision issues. In a nutshell, some floating point numbers (your "numbers with decimals") cannot be held precisely in binary - just like you cannot represent 1/3 as a finite decimal number. Depending on the storage format (float or double), you might observe differences in the values. I don't think this is a problem here, but I might be wrong.

              I believe the problem is merely a matter of formatting the numbers. Hence, my suggestion above.

              Comment


              • #8
                Originally posted by Chiara Tasselli View Post
                Do you mean in the import or in the destring? Because it doesn't seem like destring supports that option.
                I meant for the import delimited command.

                For the destring command, you would omit the float option. (I'm not sure why you added it.)

                Comment


                • #9
                  Originally posted by Joseph Coveney View Post
                  For the destring command, you would omit the float option. (I'm not sure why you added it.)
                  I completely overlooked that option. Joseph is right, it's a precision issue. Do not use float!

                  Comment


                  • #10
                    Thank you very much everyone!

                    I’ve now updated my import syntax to the following:

                    Code:
                    import delimited "01_dati\csv\Progetti.csv", bindquote(strict) asdouble delimiter(";") varnames(1) encoding("utf-8") clear
                    
                    format %20,2fc finanziamento* altrifondi

                    However, I’m still experiencing rounding issues in some cases that I can’t explain.

                    I noticed the issue because I’m trying to sum various funding components that should add up exactly to the total funding amount. Strangely, when I export the values that don't match and sum the individual components in Excel, the total there matches the expected value — whereas the totals computed in Stata, it differs.

                    To check this, I tried summing in two different ways:
                    Code:
                    gen finanziamento_TOT_check2 = finanziamentostato + finanziamentostatobilancio + finanziamentostatofoi + finanziamentoprosecuzioneoperepu + finanziamentouediversodapnrr + finanziamentoregione + finanziamentoprovincia + finanziamentocomune + finanziamentoaltropubblico + finanziamentoprivato + finanziamentodareperire + finanziamentopnrr + finanziamentopnc + altrifondi
                    
                    egen finanziamento_TOT_check3 = rowtotal(finanziamentostato finanziamentostatobilancio finanziamentostatofoi finanziamentoprosecuzioneoperepu finanziamentouediversodapnrr finanziamentoregione finanziamentoprovincia finanziamentocomune finanziamentoaltropubblico finanziamentoprivato finanziamentodareperire finanziamentopnrr finanziamentopnc altrifondi)
                    But the problem persists.

                    Comment


                    • #11
                      Probably, still the same issue: use
                      Code:
                      generate double ... = ...

                      Comment


                      • #12
                        Originally posted by daniel klein View Post
                        Probably, still the same issue: use
                        Code:
                        generate double ... = ...
                        Yes, you're right.
                        Many thanks, the results are better now.

                        Comment


                        • #13
                          Good afternoon everyone,

                          Following up on a previous conversation, I would like to ask: what is the best way to export a dataset to CSV in Stata so that numeric values are saved in the format for example 1324500000.30, using a comma as the decimal separator (not a dot), and ensuring that values are not rounded during export?


                          I'm looking for the most reliable syntax or method to preserve this formatting during the export process.

                          Moreover, is it possible—when exporting to CSV—to specify which variables are strings and which are numeric? I need to export a dataset containing both types, and I’d like to avoid formatting issues (e.g., quotation marks in strings, incorrect decimal handling, etc.).

                          I'm familiar with several options used during import—such as encoding("utf-8"), binquotes(strict), parselocale(it_IT), or decimalseparator(,)—but these don't seem to be supported during export.

                          Any help or best practices would be greatly appreciated.

                          Best regards,
                          Last edited by Chiara Tasselli; 22 Jul 2025, 08:28.

                          Comment

                          Working...
                          X