Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable Types - Importing from Excel

    Hi there,

    I attempted to identify my issue in the archives with no luck. I appreciate your helpful comments. I am confused with out output of my PhysCode variable. The original variable is primarycarecode, and I used encode PhysCode, gen(PhysCode) to create the PhysCode variable. As you may notice by the header, the original file is .csv, and I imported that into Stata. I ensured the variable format was set to general in excel prior to importing.

    The problem appears that my PhysCode variable in properly storing my code(s) correctly. For example, using the bold example below it is my understanding, Stata reads code "50024155" as "1461".

    Thanks again for your assistance for properly formatting in Stata.

    -Adam


    codebook primarycarecode PhysCode

    -----------------------------------------------------------------------------------------------------------------------------------
    primarycarecode Primary Care Code
    -----------------------------------------------------------------------------------------------------------------------------------

    type: string (str10)

    unique values: 6,915 missing "": 15,121/139,561

    examples: "50014385"
    "50032165"
    "50047861"
    "50066131"

    -----------------------------------------------------------------------------------------------------------------------------------
    PhysCode Primary Care Code
    -----------------------------------------------------------------------------------------------------------------------------------

    type: numeric (long)
    label: PhysCode

    range: [1,6915] units: 1
    unique values: 6,915 missing .: 15,121/139,561

    examples: 1461 50024155
    2971 50040123
    4542 50058039
    6301 50095422
    Last edited by Adam Bunnell; 19 Sep 2017, 11:29.

  • #2
    The problem is not with Stata, it is with your understanding of -encode- vs -destring-.

    -encode- does, and is supposed to do, exactly what you show here. -encode- first sorts the given variable in alphabetical order. It then links each value to consecutive integers starting from 1. It then creates and attaches a value label where each integer is labeled with the string whose value it represents.

    If what you wanted to do is create a numeric variable whose actual numeric values are what the string values look like to human eyes, you need to use -destring- for that. -help destring-.

    Comment


    • #3
      Code:
      encode PhysCode, gen(PhysCode)
      should presumably be

      Code:
      encode primarycarecode, gen(PhysCode)
      I don't think you've shown a problem. codebook isn't obligated to show corresponding examples for two variables, so far as I can see. When it sorts a string variable the missings go to the beginning; when it sorts a numeric variable they go to the end, so that may account for offlap here.

      You can check for one-to-one mapping by using

      Code:
      bysort primarycarecode (PhysCode) : gen diff = PhysCode[1] != PhysCode[_N]
      count if diff
      following principles of https://www.stata.com/support/faqs/d...ions-in-group/
      Last edited by Nick Cox; 19 Sep 2017, 12:02.

      Comment


      • #4
        Thank you gentlemen for your assistance.

        Comment

        Working...
        X