Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata generating garbage decimal places in float variable

    Hi -

    I'm trying to figure out why Stata (v14.1) seems to be added a bunch of extra non-sensical decimal places to my data. I'm working with Census Tract ID numbers, which are always either 4-digits long (i.e. XXXX) or 4 digits with two decimal places (XXXX.XX). When I ask Stata to "list" the tracts, they appear as normal, but when I look in the Data Editor I can see that, for example, Tract 4502.01 is actually being read by Stata as though it were an approximation for 4502.0098, and Tract 4507.41 is being read as though it were an approximation for 4507.4102 (pic attached to demonstrate what I mean). Note that none of my observations that are only 4 digits long have any issues, and this doesn't even affect all of the tracts with decimal places, just a handful of tracts at random is seems. [Edit: I've triple-checked the original data and these decimal places are not present, so this is clearly happening only after I import the data into Stata.]

    So my first question is why Stata is taking creative license with my data, and how I should go about telling Stata to stop.

    I have tried converting the variable from a float to a string, thinking maybe that would accomplish something -- no dice. I still have garbage decimal places.

    I've also tried circumventing this problem altogether by trying to import the variable (which came from a .csv file) as a string variable rather than as a float, but being a novice I'm struggling to figure out how to do this using the "stringcols" option. I tried using the following....

    import delimited using "\\Client\C$\Users\Heather\Desktop\Tax base capacity trial\Raw Crosswalk Data.csv", stringcols(4)

    ....thinking that because the field that I want to be imported as a string is the 4th field in the dataset that I should be specifying the number 4... but having reviewed the result this is obviously not how it's done and I can't seem to find any examples online of how to properly use the stringcols option. My second question is therefore how to import .csv data with some fields as strings.

    Thanks!
    -Heather
    Attached Files
    Last edited by Heather Bromfield; 11 Mar 2018, 17:26.

  • #2
    Welcome to Statalist, Heather.

    As you expect, your problem can be solved by treating your tract codes as string data, not as numeric data. Converting the current values to string is not the best approach to solving the problem. Your approach of importing the tract as a string variable is the appropriate solution.

    To import the tract as a string, your approach seems correct, given that I cannot see precisely how the raw data looks. Try working interactively: use the File > Import > Text data (Delimited, *.csv, ...) menu and in the dialog box that previews your data, follow the instructions to right-click on the tract column and select from the popup menu the option to import that column as a string. You'll see in your Results window the actual command Stata generated to read the data, which you can then copy and paste into your do-file for future use.

    Your problem can be understood as one of numeric precision. It is no more possible to accurately represent 1/100 as a binary fraction than it is to represent 1/3 as a decimal fraction - as 1/3 is a continuing decimal fraction, so 1/100 is a continuing binary fraction. See the output of help precision for an overview, and William Gould's four blog entries that are shown, among other results, by search precision. The bottom line is, there is no good way you store your tract codes in Stata as numeric with precisely two decimal digits of precision.
    Last edited by William Lisowski; 11 Mar 2018, 19:39.

    Comment


    • #3
      Thanks William for all of the suggestions -- glad to know I was on the right path. The reason that I was trying to use commands rather than working interactively was because I had already tried using the File > Import > Text data etc. method, but when I right-clicked on the columns, the option to import specified columns as strings was grayed out (no idea why). After some more trial and error I've now realized that, for whatever reason, if I convert the file from .csv format to .xlsx format and then import the first row as field names, the precision error goes away. This workaround doesn't make much sense, but it's all I've got.

      Comment


      • #4
        You are no longer on the right path.

        You have not solved your problem, only hidden it from your view by causing the tract to be stored as a Stata double rather than a float. While 0.01 is rounded on display to what you expect to see, it is still not stored precisely. If you are lucky this won't come back to haunt you. If you are not, it will cause problems, perhaps in a difficult-to-find way.

        You would be better off continuing to work to read the tract in as a string. I find that when using the interactive dialog, I have to left-click once to select a column, then right-click on the selected column in order for the "import the selected columns as strings" to not be greyed out (because now a column is selected). I'm on a Mac, for which Stata has a number of interface peculiarities; I can't attest to the Windows or Linux implementations.

        Comment


        • #5
          On reflection, the problem with the import delimited command in post #1 may be that you omitted the option that tells Stata the first row of your input contains variable names, so Stata treated them as part of the data and imported every variable as string. You only said that the command didn’t do what you expected without giving the symptoms, so I can’t be sure this is the problem.

          Comment


          • #6
            https://www.stata.com/manuals/d.pdf p.347ff has more on the stringcols() option.

            Similar material is bundled with your Stata as .pdf documentation (given that you're using 14.1).

            Your syntax looks exactly right from here.

            Comment


            • #7
              William - I tried what you suggested as far as left-clicking on the column (instead of the column header) to highlight the column and THEN right-clicking, which worked. What a frustratingly simple solution. (I'm using Stata remotely through a receiver so I'm actually wondering if something wasn't functioning properly, because I swear I tried clicking on anything and everything yesterday).

              Nick - thanks for pointing me to that reference. Having gotten the interactive function to work, I can confirm that the code that I was trying to use yesterday should have indeed worked. I've looked through what I typed up yesterday and I can't find a typo or anything... clueless as to why this didn't work yesterday.

              Code:
              . import delimited "\\Client\C$\Users\Heather\Desktop\Tax base capacity trial\R
              > aw Crosswalk Data.csv", stringcols(4) 
              (10 vars, 11,660 obs)

              Comment


              • #8
                A dollar sign can be problematic as Stata is always looking for global macro references. But that won't bite here.

                Can you type the first 10 or so lines of the file?
                Last edited by Nick Cox; 13 Mar 2018, 01:28.

                Comment


                • #9
                  Sorry, I'm not sure how to do that yet (new to Stata and even newer to Statalist) - can you give more specific instructions?

                  Comment


                  • #10
                    There is a Stata command

                    Code:
                    type 
                    that feeds on a filename. The request is to copy and paste the first few lines of the result into this thread. For help on type, type

                    Code:
                    help type
                    in the Stata command window. Copy and pasting from (e.g.) a text editor would be fine too.

                    Comment

                    Working...
                    X