Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert UK postcode string to combine with numerical variable

    Dear Statalist community,

    As part of my research I am trying to merge two data sets (WERS and ASHE) using firms' unique identifiers (IDBR number) in combination with their postcodes to establish a unique workplace identifier. The problem here with the UK postcodes is that they are not constructed in a completely systematic way. They can range from 5 to 7 characters. Sometimes they begin with one letter followed by one digit, sometimes followed by two digits and at other times they begin with two letters followed by two digits before a space.

    Examples:

    G2 5HN
    B37 5TT
    BS14 OTJ
    EC3A 2BE

    I have followed the discussion and comments on this topic (https://www.stata.com/statalist/arch.../msg00144.html). Unfortunately, deleting the space did not change anything about the character of the variable. It is still a string variable which cannot be combined with the IDBR number to generate the unique identifier. I always receive the error message that these two variables mismatch even when using: . gen identifier = IDBR + string(postcode,"%02.0f").

    Any help is dearly appreciated.

    Felix

  • #2
    what you suggest doesn't work because postcode is already string and, apparently, IDBR is not; instead try
    Code:
    gen id=string(IDBR) + postcode
    I am not familiar with IDBR, and you don't show any examples, so I don't know whether you want a format on this

    Comment


    • #3
      I am broadly with Rich on this, but I would insert a space, as in

      Code:
      gen id = string(IDBR) + " " + postcode
      or

      Code:
      egen id = concat(IDBR postcode), p(" ")
      on various grounds, such as

      1. The space does no predictable harm.

      2. It makes the variable more readable.

      3. It doesn't make any later processing more difficult and it may make it easier.

      Comment


      • #4
        Thank you both, Rich and Nick, for you quick help. I wasn't aware that one can use string variables to merge, so I tried to convert the string variable postcode. But simply turning IDBR into a string and combining it with the string postcode worked! Thanks

        Comment


        • #5
          Nothing stop you merging on two or more identifiers. The matching just needs to be exact. You have to go to the manual entry to see examples, but they aren't difficult to think about. It would be natural to merge panel data on identifier and time variable, in particular.

          Comment

          Working...
          X