Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data type/precision -import dbf-

    I'd like advice on double vs. float precision on some data values obtained via import of a dbf file.

    In the file of interest, the observation units are U.S. census tracts, with latitude and longitude among the variables that matter to me. I noticed that:

    1) Stata imported latitude/longitude as floats, not doubles, which could matter in some applications.
    2) -import dbf- does not have any type options
    3) MS Excel gets different values for these variables than does Stata, with absolute differences around 1e-6

    I'm guessing (?) that Excel imported the dbf file at double precision?

    For example:

    Code:
     
    Lat (Excel) Lat(Stata) abs diff
    32.4770395818 32.4770393372 2.4464E-07
    32.4742513120 32.4742507935 5.1854E-07
    32.4754284379 32.4754295349 1.09701E-06
    32.4719824405 32.4719810486 1.39192E-06
    32.4586597838 32.4586601257 3.4193E-07
    The differences likely don't matter in my current application, but if I did need double precision lat/long, what should be done?
    All I can think of here is -odbc- (I don't see anything about a double option in -odbc-.) I'd hate to have to import multiple dbfs into Excel and then -import excel-.

  • #2
    This is just a stab in the dark, but have you tried doing the conversion from .dbf to .dta using StatTransfer? They're usually very smart about these things. (I don't know how they handle this particular issue as I've never come across .dbf files in my work.)

    Comment


    • #3
      stat/transfer lets you import dbf files and set the "type"; I do not have any such files to try and so can't swear that "double" is not greyed out but ...

      Comment


      • #4
        Thanks Clyde and Rich. As it happens, I don't have StatTransfer, but that sounds like a good route. The only time I encounter dbf files is when I'm doing the minimal spatial things I do.

        I did check and found out (https://en.wikipedia.org/wiki/Decimal_degrees) that even at the equator, 1e-5 degrees of longitude is about 1 meter, so I'm not sure I'll ever need double precision for anything spatial that I do. This makes me wonder why I had taken for granted the common advice that lat/long needed to be stored as doubles. I wonder if that advice is correct. One of our spatially knowledgeable people will explain, I hope.

        Comment


        • #5
          Originally posted by Mike Lacy View Post
          Stata imported latitude/longitude as floats, not doubles
          The values must have been originally stored in the dBase file as single-precision. When they're stored as double-precision, then Stata imports them as such. See below.

          .ÿ
          .ÿversionÿ15.1

          .ÿ
          .ÿclearÿ*

          .ÿ
          .ÿsetÿseedÿ`=strreverse("1492598")'

          .ÿ
          .ÿlocalÿline_sizeÿ`c(linesize)'

          .ÿsetÿlinesizeÿ72

          .ÿ
          .ÿdisplayÿinÿsmclÿasÿtextÿc(type)
          float

          .ÿ
          .ÿquietlyÿsetÿobsÿ3

          .ÿ
          .ÿgenerateÿbyteÿrowÿ=ÿ_n

          .ÿgenerateÿdoubleÿxdÿ=ÿrnormal()

          .ÿgenerateÿfloatÿxfÿ=ÿfloat(xd)

          .ÿ
          .ÿcountÿifÿxdÿ==ÿxf
          ÿÿ0

          .ÿ
          .ÿquietlyÿexportÿdbaseÿusingÿtest.dbf,ÿreplace

          .ÿ
          .ÿrenameÿx?ÿ=_original

          .ÿ
          .ÿtempfileÿtmpfil0

          .ÿquietlyÿsaveÿ`tmpfil0'

          .ÿ
          .ÿ*
          .ÿ*ÿSeeÿbeginningÿhere
          .ÿ*
          .ÿdropÿ_all

          .ÿimportÿdbaseÿusingÿtest.dbf
          (3ÿvars,ÿ3ÿobs)

          .ÿ
          .ÿdescribeÿrowÿx?

          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿstorageÿÿÿdisplayÿÿÿÿvalue
          variableÿnameÿÿÿtypeÿÿÿÿformatÿÿÿÿÿlabelÿÿÿÿÿÿvariableÿlabel
          ------------------------------------------------------------------------
          rowÿÿÿÿÿÿÿÿÿÿÿÿÿbyteÿÿÿÿ%3.0fÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrow
          xdÿÿÿÿÿÿÿÿÿÿÿÿÿÿdoubleÿÿ%20.0fÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿxd
          xfÿÿÿÿÿÿÿÿÿÿÿÿÿÿfloatÿÿÿ%20.0fÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿxf

          .ÿ
          .ÿmergeÿ1:1ÿrowÿusingÿ`tmpfil0',ÿassert(match)ÿnogenerateÿnoreport

          .ÿforeachÿvarÿofÿvarlistÿx?ÿ{
          ÿÿ2.ÿÿÿÿÿÿÿÿÿassertÿ`var'ÿ==ÿ`var'_original
          ÿÿ3.ÿ}

          .ÿ
          .ÿsetÿlinesizeÿ`line_size'

          .ÿ
          .ÿeraseÿtest.dbf

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .

          Comment


          • #6
            Aha, thanks. Knowing that the dbf format *does* have typed variables and that Stata recognizes that feature closes the thread for me.

            Comment

            Working...
            X