Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handling addresses in STATA

    Hi! address in the dataset is in the following format: 59 NORTH STREET, PLAINFIELD, MASSACHUSETTS, 01070. I have been struggling to separate this aggregate address into separate parts such as street address (59 NORTH STREET), Town (PLAINFIELD), State (MASSACHUSETTS), and Zipcode (01070). I am wondering if there is an efficient way to do so. Any suggestion or insights will be much appreciated. Thank you.

  • #2
    If the separator is a comma

    Code:
    split address, p(,) gen(split)

    Comment


    • #3
      Assuming that the address is stored in a variable called address, then you can split it up into parts with split address, gen(addr) parse(,) .

      The problem that you are likely to run into is that this relies on the comma separating the parts, and in real data you will always have observations where the comma is not used consistently: maybe a simple typo when entering the data, maybe a weird street name that conflicts with the comma, maybe the respondent made a typo or comes from a country with a different convention on how to separate the parts of an address, maybe .... So at the very least you will need to check the parts carefully, and probably do some fixing.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thanks Andrew. You saved my day! Your suggested codes worked perfect.

        Comment


        • #5
          Your suggested codes also worked perfect in my case, @Maarten. Thanks a lot for help.

          Comment

          Working...
          X