Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Foreach variable search with string variable

    I am trying to make a foreach search through a variable list easier and would really appreciate some help.

    I'm currently using STATA 14 on windows.

    I'm using an administrative database that contains ICD-9 diagnostic codes, numeric, that are saved as string variables. I have tried to destring but STATA won't allow me to.

    They are contained within variables dx1-dx15 and have a three number core followed by up to two more numbers that further specify the type of diagnosis.
    For example it could be 434, 4341, 43412, etc...

    Sometimes I search for a specific codes and that works fine but I am asking for help in figuring out a way to search for ALL codes starting with 434 (as an example) and followed by whatever numbers. That way I wouldn't have to specify every number.

    My typical search goes something like this:

    gen diagnosis=0
    foreach var of varlist dx1-dx15 {
    replace diagnosis=1 if (`var'=="434"|`var'=="4340"|`var'=="43401"......)
    }

    I would appreciate any help in making that process easier!

    Thanks in advance!

  • #2
    You can use wildcards by using the -strmatch()- function. -help strmatch()-.

    More complicated situations might call for using some of the regular expression functions, but for what you describe, -strmatch()- is simpler and should suffice.

    Comment


    • #3
      Another possibility is to use substr()

      Code:
      gen diagnosis = 0
      foreach var of varlist dx1-dx15 {
          replace diagnosis = 1 if  substr(`var', 1, 3) == "434"
      }

      Comment


      • #4
        Thanks to both answers! I found the substr command to be the easiest to use and have already been successful with that command.

        Of course the problem that I have now run into is how to exclude one particular code, for example from

        substr(`var',1,3) =="305" with the goal to exclude only one diagnosis such as 3051, and include all of the others(3052, 30521, or 3053, 30531...)

        Thanks for the help!

        Comment


        • #5
          Code:
          whatever if substr(`var', 1, 3) == "305" & `var' != "3051"
          


          Comment


          • #6
            How would the command change if the diagnosis code starts with a character such as E100, E101 and so on?
            Thanks

            Comment


            • #7
              Basically, it wouldn't change.
              Code:
              whatever if substr(`var', 1, 4) == "E100" & `var' != "E1001"

              Comment

              Working...
              X