Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • string variable management

    Good afternoon,
    I would like to create a categorical variable from a string variable composed by several and heterogeneous words.

    Is there a Stata command that works in this way (see below)?

    example:

    Code:
    if X contains "a word that could be in the string " gen = y
    Many thanks in advance for your time.

  • #2

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str9 var1
    "frog"     
    "toad"     
    "toad frog"
    "frog"     
    "newt"     
    "dragon"   
    end
    
    gen wanted = strpos(var1, "frog") > 0
    
    . list, sep(0)
    
         +--------------------+
         |      var1   wanted |
         |--------------------|
      1. |      frog        1 |
      2. |      toad        0 |
      3. | toad frog        1 |
      4. |      frog        1 |
      5. |      newt        0 |
      6. |    dragon        0 |
         +--------------------+
    Watch out for spelling differences, including differences of upper and lower case.

    Comment


    • #3
      To Nick's advice about spelling differences, let me also add to watch out for "substring" matches - if you are looking for "ant" and your variable contains "phesant", for example, you will get a match you perhaps do not want. The strpos() function matches strings of characters rather than "words", but that is often sufficient.

      Comment

      Working...
      X