Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find string which contains X but does not contain XY

    Hello,
    I am operating Stata/SE 15.1 on a Windows Server.
    I have a string variable, called object_description, and I want to create another string variable, called product_id, which will be equal to "XYZ" if object_description contains the character "SHIP".
    To complicate the problem, variable object_description also contains observations with words such as: "LIGHTSHIP", "MEMBERSHIP" and "SHIPPING". I don't want my new variable product_id to be equal to "XYZ" if object_description contains such expressions. Therefore, I do the following:
    Code:
    gen  product_id = "XYZ" if strpos(object_description, "SHIP") & ( !(strpos(object_description, "MEMBER")) | !(strpos(object_description, "LIGHT")) | !(strpos(object_description, "PING")) )
    I also tried:
    Code:
    gen  product_id = "XYZ" if strpos(object_description, "SHIP") & ( !(strpos(object_description, "MEMBERSHIP")) | !(strpos(object_description, "LIGHTSHIP")) | !(strpos(object_description, "SHIPPING")) )
    Neither work.
    In fact, after I run the code and I browse the dataset, I find that whenever object_description contains, say, "MEMBERSHIP", my new variable product_id is equal to "XYZ", as if Stata were not reading what follows the &-statement.

    I am very confused. Any suggestion would be much appreciated.

    Cordially
    Edoardo
    Last edited by Edoardo Briganti; 01 Apr 2021, 17:14.

  • #2
    You got your boolean algebra mixed up. Think about what happens when object_description contains "MEMBERSHIP". First, strpos(object_description, "SHIP") is non-zero, that is, true.

    Now let's look at the second conjunct. !strpos(object_description, "MEMBER") will be false, because MEMBER does appear. So far, so good. But then we have the disjunct !strpos(object_description, "LIGHT"). Well, LIGHT does not appear in object_description, so !strpos(object_description) is true. Therefore when ORed together with !strpos(object_description, "MEMBER") you get true, and, of course, it remains true when the other disjuncts get ORed in.

    So the whole -if- condition comes down to true & true, which is, wait for it..., true.

    What you want is:
    Code:
    gen product_id = "XYZ" if strpos(object_description, "SHIP") & !strpos(object_description, "MEMBER") & !strpos(object_description, "LIGHT") & !strpos(object_description, "PING")
    If you would prefer to see fewer ! characters, you can apply the distributive law, which, done correctly gives
    Code:
    gen product_id = "XYZ" if strpos(object_description, "SHIP") & !(strpos(object_description, "MEMBER") | strpos(object_description, "LIGHT") | strpos(object_description, "PING"))

    Comment


    • #3
      What a silly mistake! Time for a break from working. At least it was not a coding error.
      Thanks Clyde for correcting me.

      Best,
      Edoardo

      Comment

      Working...
      X