Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting DOB from String

    Hi Everyone,

    I am extracting relevant information from a radiology dictation database and have been trying to extract the DOB of individuals. This is what I am currently trying:
    gen age = regexs(0) if (regexm(ImpressionLow, "[0-00][\-][year][\-][old]")

    As you can see I am looking for ages listed as "12-year-old" etc. Can anyone help me out!? Also, this forum is unbelievably helpful. Thank you all for taking the time to help us beginners out!

  • #2
    That should be: gen age = regexs(0) if (regexm(ImpressionLow, "[0-100][\-][year][\-][old]")

    Comment


    • #3
      My experience dealing with narrative dictation records like this is somewhat limited. But it is vividly memorable and quite unpleasant. Tasks like this are really onerous--the word hopeless springs to mind. You will see all sorts of combinations of years/year/yr/y mixed with old/o/of age, and other orders such as age XX years, as well as every imaginable misspelling and some you would never even imagine. And then there will be records where nothing about the age is recorded. And if there are infants in the database you will also have to find months, and possibly even days or weeks. Getting good information out of this kind of data set is something akin to making hot chocolate out of mud.

      Since you have gained access to the radiology diction database, perhaps you can also get limited access to the same facilities' administrative databases, link on the medical record number, and get the date of birth from the administrative records. It will almost certainly be there in an easily machine-friendly form, because insurance companies won't pay a claim that doesn't include the patient's date of birth. And it will be a full date of birth, not just how many years old the radiologist imagines the patient to be at the time he/she dictates the report. It will also have a high probability of being correct because it will have to match the insurer's records.

      Comment


      • #4
        Hi Clyde,

        Thanks for your answer. I was able to solve the problem with:
        gen age = regexs(1) if regexm(ResultLow, "([0-9][0-9][\-]year[\-]old)")
        This, in fact, worked relatively well because the date order is pretty well standardized.

        Comment

        Working...
        X