Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Attaching a value label to an integer variable with values greater than Stata’s maximum for long integers

    I have a CSV file with a field storing SNOMED codes. This is an encoded collection of medical terms, e.g.
    • 10623005 "Fibrous dysplasia of bone"
    • 111246005 "Arthrogryposis"
    • 111315008 "Longitudinal deficiency of tibia AND/OR fibula"
    • 12247221000119104 "Patellofemoral syndrome of bilateral knees"
    • 15670601000119100 "Congenital genu valgum of bilateral knees"

    Some of the codes are 17 digits long. When I open the CSV file in Notepad or WordPad I can see the values are stored as integers, e.g. 15670601000119100. However, Stata stores these values in a double data type using scientific notation. For example, 15670601000119100 is stored as 1.567e+16. As such, I’m unable to even define a value label. Stata gives me an error message like this one:
    may not label 1.22472e+16

    The values in the SNOMED variable are all integers, so, in theory, Stata should be able to define and attach value labels to it. However, the maximum value for a long integer in Stata is 2,147,483,620. Does anyone know of a way around this?

  • #2
    You cannot store these as any numeric data type in Stata without loss of precision, they are just too long. 16 digits of precision is the most you can accommodate. You need to revise the import into Stata and just bring them in as a string variable. If you want to also retain the text descriptions of these numbers, there is no value labeling for strings that look like integers: you will just have to have a separate variable for the text descriptions.

    Comment


    • #3
      Thanks Clyde for confirming that I'm unable to store these values as integers in Stata. I've followed your advice and read this variable in as a string. Below is the Stata script I created to define a value label ...in case anyone wants to do the same thing and finds this helpful.

      Code:
      clear
      
      /*The SNOMED variable dxsnomed contains integer values that are >16 digits long. Stata stores these
      values in scientific format, which prevents value labels from being defined and attached.
      Reading in this variable as a string. It's in column 23 of the CSV file.*/
      
      import delimited <varlist> using "<filename>.csv", stringcol(23) varnames(nonames)
      
      *Encoding dxsnomed to an integer in order to attach a value label.
      tempfile tmpdata
      encode dxsnomed, gen(dxsnomed_n)
      
      *Creating a data file that contains each value and its corresponding text.
      uselabel dxsnomed_n,clear
      keep value label
      save `tmpdata'
      
      
      /*Extracting each item in (what would have been) the "label define" command line. Each item will be
      saved in a data file for merging with `tmpdata'. The string codes will be saved in one column and the
      matching text in another column within the same row.*/
      
      *Creating a local macro for the contents of the "label define" command.
      local labdefine "`"10623005 "Fibrous dysplasia of bone" 111246005 "Arthrogryposis" ///
      111315008 "Longitudinal deficiency of tibia AND/OR fibula" ///
      12247221000119104 "Patellofemoral syndrome of bilateral knees" ///
      125584006 "Acquired deformity of lower limb" 13814009 "Hypertrophy of bone" ///
      15670601000119100 "Congenital genu valgum of bilateral knees" ///
      15714361000119104 "Acquired genu valgum of bilateral knees" ///
      17230005 "Acquired deformity of limb" 190860000 "Hypophosphatasia rickets" ///
      1926006 "Osteopetrosis" 203499004 "Hypertrophic nonunion of fracture" ///
      203517001 "Complete epiphyseal arrest" 205202001 "Congenital absence of leg and foot" ///
      205211001 "Proximal femoral focal deficiency" 205358006 "Split foot" ///
      205369009 "Congenital overgrowth of lower limb" 205395006 "Congenital angulation of tibia" ///
      205838004 "Congenital hemihypertrophy" 21708004 "Osteosarcoma" ///
      240139008 "Chondrolysis of articular cartilage" 240194000 "Disorder of continuity of bone" ///
      249779004 "Hypertrophy of leg" 249784005 "Rotation of lower limb" ///
      254044004 "Multiple congenital exostosis" 268274005 "Enchondromatosis" ///
      295041000119108 "Acquired deformity of ankle" 307576001 "Osteosarcoma of bone" ///.
      397932003 "Talipes equinovarus" ///
      423610004 "Rhabdomyosarcoma of connective or soft tissue" ///
      42808000 "Longitudinal deficiency of tibia" 429696002 "Patellar instability" ///
      45939007 "Leg length inequality" 52837007 "Longitudinal deficiency of femur" ///
      5321004 "Acquired deformity of knee" 55379003 "Congenital pseudarthrosis of tibia" ///
      56007004 "Congenital hemihypertrophy" 59708000 "Multiple epiphyseal dysplasia" ///
      66078008 "Longitudinal deficiency of lower limb" 67341007 "Longitudinal deficiency of limb" ///
      68421004 "Epiphyseal arrest" 723311002 "Amputation of left foot" ///
      723689001 "Amputation of left and right leg through tibia and fibula" ///
      732213003 "Amputation of right lower limb" 76744005 "Longitudinal deficiency of fibula" ///
      78314001 "Osteogenesis imperfecta" 79353000 "Tibia vara" ///
      88312006 "Amputation of leg through tibia and fibula" ///
      9252005 "Congenital bowing of tibia and/or fibula" 92824003 "Neurofibromatosis, type 1" ///
      93298007 "Congenital hypoplasia of tibia" 95859009 "Traumatic amputation of foot""'"
      
      *Dividing the local macro into tokens.
      tokenize `labdefine'
      
      *Counting the number of tokens.
      local i=1
      while "``i''" != "" {
          local count = `i'
          local i = `i' + 1
      }
      
      *The new data file will contain an observation for each string code and matching text.
      *The number of observations will be half the number of tokens.
      local obs = `count'/2
      
      *Setting up the new data file.
      clear
      set obs `obs'
      qui gen label=""
      qui gen text=""
      
      *Placing the code and matching text from the local macro `labdefine' into separate rows.
      local i=1
      local j=1
      while "``i''" != "" {
          qui replace label = "``i''" in `j'
          local i = `i' + 1
          qui replace text = "``i''" in `j'
          local j = `j' + 1
          local i = `i' + 1
      }
      
      *Merging with the temporary data file created from the encoded string dxsnomed.
      merge 1:1 label using `tmpdata'
      /*`labdefine' may contain more codes than there are in the CSV file.
      (I won't go into why, as it's difficult to explain succinctly and a distraction.)
      Checking _merge is not 2, then keeping only the records that merged.*/
      assert _merge!=2
      keep if _merge==3
      
      *Creating a value label for the SNOMED variable.
      qui levelsof value, local(levels)
      foreach lev of local levels {
          local text=text[`lev']
          label define snomedlab `lev' "`text'", modify
      }
      
      exit

      Comment

      Working...
      X