Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • float to string format error results in wrong values -- why?

    I need to create the fipscode for census tracts that were formatted incorrectly.

    The error occurs when I try to make a string variable from the newly created trid90 variable. For some reason, line 204 and line 204 have the same fipscode but the censusTract and the trtid90 both show that they are different. Do you have any idea why this would occur?

    In the second screenshot you can see that in lines 163 and 164 there isn’t an issue. I’m very confused. I hope you can see the error.

    Here is the code I used:
    Code:
     format state %02.0f
    format county %03.0f
    codebook censusTract
    format censusTract %6.2f
                gen trtid90 = censusTract * 100
                label variable trtid90 "1990 census tracts w/o decimal"
     
    gen str_state =string(int(state), "%02.0f")
    gen str_county =string(int(county), "%03.0f")
    gen str_censusTract =string(int(trtid90), "%06.0f")
    egen fipscode = concat(str_state str_county str_censusTract)
    sort fipscode activityYear
     duplicates report fipscode
     duplicates list  fipscode, nolabel sepby(fipscode) // 13
     duplicates tag fipscode, gen(dup_fipscode)
    Attached Files

  • #2
    You are geting messed up by precision issues. Numbers like 319.02 cannot be represented exactly in finite binary, just as 1/3 cannot be exactly represented in finite decimal notation. When you create trtid90 by multiplying by 100, you do not get the census tract without the decimal point. You get some number that is close to that, but there are some extra bits lurking further out beyond the binary point where you don't see them in your display. What Stata is showing you in the display is rounded off to 5 digits. But then when you create str_census_tract, you truncate to an integer--which does not produce the same result as rounding in some cases. That is why you are seeing discrepancies. I think a better approach is:

    Code:
    tostring census_tract, gen(str_census_tract) format(%07.2f)
    replace str_census_tract = subinstr(str_census_tract, ".", "", .)
    and then use str_census_tract in your call to -egen, concat()-.

    Comment


    • #3
      Dear Clyde, Thank you for your helpful insights.

      unfortunately, when I tried the code I received the following message:

      Code:
       tostring  censusTract, gen(str_census_tract) format(%07.2f)
      censusTract cannot be converted reversibly; no generate
      I tried the following but lost the precision again. i.e 594.01 became 059400

      Code:
       gen str_census_tract =string(int(censusTract), "%07.2f")
       replace str_census_tract = subinstr(str_census_tract, ".", "", .)

      Visual of my goal.
      The fipscode needs to be 11 digits long: 2-state, 3-county, 6-tract with leading (and trailing) zeros when necessary.
      Click image for larger version

Name:	FIPSCode_Part4.png
Views:	1
Size:	7.5 KB
ID:	1536159


      Thank you again!
      Last edited by Brennan RBratton; 11 Feb 2020, 20:56.

      Comment


      • #4
        Just add the -force- option to the -tostring- command.

        Comment


        • #5
          Hi Clyde, thanks a bunch.
          Turns out the error persists because when I import the censusTract data as float Stata reads some of the values as 0317.010010 instead of 0317.01. When I imported censusTract as string and used your code:

          Code:
           replace str_census_tract = subinstr(str_census_tract, ".", "", .)
          It worked like a charm!

          Thank you again for sharing your insights about precision and how to remove the decimal logically.

          Comment

          Working...
          X