Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting census tract IDs of varying lengths (some w/ decimals) to 6-digit string variable (w/ no decimals)

    Dear all,

    I am working with census tract numbers that are varying lengths, some with decimals and some without. I am trying to get these into a 6 digit string format with no decimals.

    For example …

    Tract ID (As Is)
    401
    2301.02
    387.01

    6-Digit Tract ID (What I Want)
    040100

    230102
    038701

    Tracts can be up to 4 digits long before the decimal, so I want leading zeros up to 4 digits. Some tracts have up to 2 digits after the decimal, so I want trailing zeros up to 2 digits. I also want to get rid of the decimal.

    If anyone has suggestions or guidance about how I might go about this, I would be very grateful.



  • #2
    You do not show whether your starting variable is a string variable or is a numeric variable in your Stata data set. If it is numeric, I would to this as:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id
        401
    2301.02
     387.01
    end
    
    gen wanted = string(id, "%07.2f")
    replace wanted = subinstr(wanted, ".", "", .)
    
    list
    If you are starting out with it as a string variable, then change the -gen wanted = ...- command to -gen wanted = string(real(id), "%07.2f")-.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      I don't think it makes that much sense to convert them to strings instead of integers. I would just multiply them by 100 and save them as -long-. If you still want to see leading zeroes you can just format them that way:

      Code:
      clear
      input float tract_id
          401
      2301.02
       387.01
      end
      
      gen long tract = round(tract_id * 100)
      format %06.0f tract
      If what you ultimately want to do is to combine the census tracts with county and state IDs, you can do as well. EG:

      Code:
      gen double id = 1e6 * state + tract

      Comment


      • #4
        Thanks to you both. In the future I'll be sure to use dataex and provide more detail. I am a long time reader but first time poster on Statalist, so perhaps you'll forgive the oversight.

        The first solution, by Dr. Schechter, worked perfectly. For future users, my variable was in numeric (float) format.

        Comment

        Working...
        X