Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a new variable if strings match certain characters

    Hello,

    I have a dataset that contains information on 28 different drugs (variables drug1-drug28) and their positivity levels, but these are string variables currently. see below for an example:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str33(drug1 drug2 drug3 drug4) str34 drug5
    "Tramadol (94.78)"                  " Benzoylecgonine (125)"            " Cocaine (0)"                     ""                              ""                                  
    "Tramadol (95.426)"                 ""                                  ""                                 ""                              ""                                  
    "Tramadol (96.38)"                  " THC (4.57)"                       ""                                 ""                              ""                                  
    "meta-Hydroxycocaine (Hair) (18.8)" " para-Hydroxycocaine (Hair) (6.4)" " Carboxy-THC (Hair) (2.46)"       " Benzoylecgonine (Hair) (218)" " ortho-Hydroxycocaine (Hair) (2.8)"
    "meta-Hydroxycocaine (Hair) (4.9)"  " Norfentanyl (Hair) (24)"          " EDDP (Hair) (1421)"              " Benzoylecgonine (Hair) (773)" " ortho-Hydroxycocaine (Hair) (4.7)"
    "para-Hydroxycocaine (Hair) (11)"   " Cocaethylene (Hair) (113)"        " meta-Hydroxycocaine (Hair) (14)" " Carboxy-THC (Hair) (2.11)"    " Benzoylecgonine (Hair) (604)"     
    end
    What i want to do is create a separate variable for each drug type that returns a value of 1 if drug1-drug28 contains the following characters "(0)". I've tried, for example,
    Code:
    gen result=regexm(drug1, "(0)")
    , but this will return a value of 1 if 0 shows up anywhere. In the example above, in the first observation, "Cocaine (0)" should return a value of 1, but in the last observation, " Benzoylecgonine (Hair) (604)" should NOT return a value of 1.

    Thank you for any assistance in syntax you can provide.

  • #2
    Include word boundaries. For these, you need Unicode regular expressions.

    Code:
    gen result3=ustrregexm(drug3, "\b(0)\b")
    Res.:

    Code:
    . l drug3 result3, sep(0)
    
         +--------------------------------------------+
         |                            drug3   result3 |
         |--------------------------------------------|
      1. |                      Cocaine (0)         1 |
      2. |                                          0 |
      3. |                                          0 |
      4. |        Carboxy-THC (Hair) (2.46)         0 |
      5. |               EDDP (Hair) (1421)         0 |
      6. |  meta-Hydroxycocaine (Hair) (14)         0 |
         +--------------------------------------------+
    
    .
    Last edited by Andrew Musau; 08 Sep 2023, 11:52.

    Comment


    • #3
      Thanks so much, Andrew. Now im thinking that for my purposes, it might be better if that instead of generating a new variable set to 1, that instead, drug3 is set to missing if "(0)" appears. Would you know the syntax to do this?

      Comment


      • #4
        Code:
        replace drug3 ="" if ustrregexm(drug3, "\b(0)\b")

        Comment

        Working...
        X