Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • False positives when trying to identify whether a string variable contains a certain keyword inside of it

    Suppose I have a long string variable "address", which has the following value:

    From the Department of Anaesthesiology & Intensive Care, Evangelical Hospital Vienna, Vienna, Austria (SAKL), Department of Anaesthesiology & Intensive Care, Glenfield Hospital, Leicester, United Kingdom (ABA), Department of Anaesthesiology, University Hospital of Copenhagen, Copenhagen, Denmark (AA, JS), Department of Anaesthesiology & Intensive Care, CHU De Grenoble Hôpital, Michallon, Grenoble, France (PA), Department of Anaesthesiology & Intensive Care, Hospital Universitario Rio Hortega, Valladolid, Spain (CA), Department of General Surgery, Lithuanian University of Health Sciences, Kaunas, Lithuania (GB), Department of Anaesthesiology & Intensive Care, University Hospital 'Federico II', Napoli, Italy (EDR), Department of Anaesthesiology, Boston Children's Hospital, Boston, Massachusetts, United States (DFa), Department of Anaesthesiology & Intensive Care, Emergency Institute for Cardiovascular Disease, Bucharest, Romania (DCF), Department of Anaesthesiology, University Hospital of Innsbruck, Innsbruck, Austria (DFr), Department of Anaesthesiology, Children's University Hospital Zurich, Zürich, Switzerland (TH), Department of Anaesthesiology & Intensive Care, Klinikum Straubing, Straubing, Germany (MJ), Department of Anaesthesiology & Pain Medicine, Maastricht University Medical Centre, Maastricht, the Netherlands (MDL), Department of Anaesthesiology & Intensive Care, Hospital Clinico Universitario Valencia, Valencia, Spain (JVLP), Department of Anaesthesia, Royal Free Hospital, London, United Kingdom (SM), Department of Anaesthesiology & Intensive Care, General Hospital Linz, Linz, Austria (JM), Department of Anaesthesiology & Intensive Care, University Hospital of Szeged, Szeged, Hungary (ZLM), Department of Anaesthesiology & Intensive Care, Franziskus Hospital, Bielefeld, Germany (NRM), Department of Anaesthesiology & Intensive Care, Groupe Hospitalier Cochin, Paris, France (CMS), Department of Anaesthesiology, CHU Brugmann, Brussels, Belgium (PJFVDL), Department of Anaesthesiology, Herlev University Hospital, Herlev, Denmark (AJW), Department of Anaesthesiology, Ghent University Hospital, Ghent, Belgium (PWo, PWy) and Department of Anaesthesiology & Intensive Care, University Frankfurt/Main, Frankfurt am Main, Germany (KZ)

    I would like to know whether the keyword "Charite" appears anywhere in this string and, if so, tag this value with the newly created variable "tag_charite", using the following code:

    gen tag_charite=.
    replace tag_charite = 1 if substr(address, strpos(lower(address),"charite"), .)!=""

    When I run this code, the above-mentioned string is being tagged (i.e., tag_charite==1), even though there is no keyword "Charite" in the above text. I would really appreciate it if you could point me to my mistake and suggest a better way to identify string variables containing a certain keyword (while avoiding false positives). Thank you.

  • #2
    "strpos()" gives a value of 0 if not found - 0 is not equal to missing (which is what your code checks for) so it is tagged; you probably want the final condition to be ">0" rather than !=""

    added in edit: also note that the result of strpos is a number and you are checking for a string missing which can never be true

    Comment


    • #3
      sorry for my layman follow-up, but when I type:

      replace tag_charite = 1 if substr(address, strpos(lower(address),"charite"), .)>0

      I get the type mismatch error:

      type mismatch
      r(109);

      Did I misunderstand your suggestion?

      Comment


      • #4
        sorry, I missed part of your code - why are you using "substr(address" at all when you just want to know if a particular string appears? try the following:
        Code:
        replace tag_charite=1 if strpos(address, "charite")>0

        Comment


        • #5
          works perfectly, thank you very much!

          Comment


          • #6
            you're welcome

            Comment

            Working...
            X