float to string format error results in wrong values -- why?

Brennan RBratton

Join Date: Mar 2019

Posts: 43
#1

float to string format error results in wrong values -- why?

11 Feb 2020, 17:25

I need to create the fipscode for census tracts that were formatted incorrectly.

The error occurs when I try to make a string variable from the newly created trid90 variable. For some reason, line 204 and line 204 have the same fipscode but the censusTract and the trtid90 both show that they are different. Do you have any idea why this would occur?

In the second screenshot you can see that in lines 163 and 164 there isn’t an issue. I’m very confused. I hope you can see the error.

Here is the code I used:

Code:

format state %02.0f format county %03.0f codebook censusTract format censusTract %6.2f gen trtid90 = censusTract * 100 label variable trtid90 "1990 census tracts w/o decimal" gen str_state =string(int(state), "%02.0f") gen str_county =string(int(county), "%03.0f") gen str_censusTract =string(int(trtid90), "%06.0f") egen fipscode = concat(str_state str_county str_censusTract) sort fipscode activityYear duplicates report fipscode duplicates list fipscode, nolabel sepby(fipscode) // 13 duplicates tag fipscode, gen(dup_fipscode)

Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#2

11 Feb 2020, 17:45

You are geting messed up by precision issues. Numbers like 319.02 cannot be represented exactly in finite binary, just as 1/3 cannot be exactly represented in finite decimal notation. When you create trtid90 by multiplying by 100, you do not get the census tract without the decimal point. You get some number that is close to that, but there are some extra bits lurking further out beyond the binary point where you don't see them in your display. What Stata is showing you in the display is rounded off to 5 digits. But then when you create str_census_tract, you truncate to an integer--which does not produce the same result as rounding in some cases. That is why you are seeing discrepancies. I think a better approach is:

Code:

tostring census_tract, gen(str_census_tract) format(%07.2f) replace str_census_tract = subinstr(str_census_tract, ".", "", .)

and then use str_census_tract in your call to -egen, concat()-.
1 like
Comment
Brennan RBratton

Join Date: Mar 2019

Posts: 43
#3

11 Feb 2020, 20:25

Dear Clyde, Thank you for your helpful insights.

unfortunately, when I tried the code I received the following message:

Code:

tostring censusTract, gen(str_census_tract) format(%07.2f) censusTract cannot be converted reversibly; no generate

I tried the following but lost the precision again. i.e 594.01 became 059400

Code:

gen str_census_tract =string(int(censusTract), "%07.2f") replace str_census_tract = subinstr(str_census_tract, ".", "", .)

Visual of my goal.
The fipscode needs to be 11 digits long: 2-state, 3-county, 6-tract with leading (and trailing) zeros when necessary.

Thank you again!

Last edited by Brennan RBratton; 11 Feb 2020, 20:56.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30116
#4

11 Feb 2020, 20:41

Just add the -force- option to the -tostring- command.
Comment
Brennan RBratton

Join Date: Mar 2019

Posts: 43
#5

12 Feb 2020, 08:51

Hi Clyde, thanks a bunch.
Turns out the error persists because when I import the censusTract data as float Stata reads some of the values as 0317.010010 instead of 0317.01. When I imported censusTract as string and used your code:

Code:

replace str_census_tract = subinstr(str_census_tract, ".", "", .)

It worked like a charm!

Thank you again for sharing your insights about precision and how to remove the decimal logically.
Comment

Announcement

float to string format error results in wrong values -- why?

Comment

Comment

Comment

Comment