Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find occurrences of string across multiple variables

    Hi,

    I want to find the amount of occurrences of a string variable across multiple variables by different groups. There's a large number of variables with various names so they might have to be renamed but I don't know how to do that either, note that this fact also makes it difficult for me to use the reshape command even though that might be a solution as well.

    Here's an example:
    ID V1 V2 V3 count_relevant_string
    1 relevant_string relevant_string Gibberish word 2
    2 Gibberish word Gibberish word relevant_string 1
    3 Gibberish word Gibberish word Gibberish word 0
    Where I want to create the variable count_relevant_string.

    Thanks in advance
    Jonatan

  • David Benson
    replied
    Jonatan, Nick's code is probably the fastest & easiest way to get what you want. I'm just adding some alternate solutions (and ideas that might also help you along your way):

    You might also find the posts here and here to be helpful

    Code:
    * Could do rename v* word*
    * See help "rename group", and "help renvar" for help with rename groups of variables
    
    dataex id v1 v2 v3  // Data shared via -dataex-. To install: ssc install dataex
    clear
    input byte id str15(v1 v2 v3)
    1 "relevant_string" "relevant_string" "Gibberish word"
    2 "Gibberish word"  "Gibberish word"  "relevant_string"
    3 "Gibberish word"  "Gibberish word"  "Gibberish word"
    end
    ------------------ copy up to and including the previous line ------------------
    
    
    egen count_non_blanks = rownonmiss(v1-v3), strok  // counts non-blanks within v1-v3
    ssc install egenmore  // in case you don't have it already
    egen distinct = rowsvals(v*)   // counts unique values within v1 v2 v3, ignores missing
    
    
    . list, noobs abbrev(18)
    
      +----------------------------------------------------------------------------------------+
      | id                v1                v2                v3   count_non_blanks   distinct |
      |----------------------------------------------------------------------------------------|
      |  1   relevant_string   relevant_string    Gibberish word                  3          2 |
      |  2    Gibberish word    Gibberish word   relevant_string                  3          2 |
      |  3    Gibberish word    Gibberish word    Gibberish word                  3          1 |
      +----------------------------------------------------------------------------------------+
    
    * Reshaping to long
    drop count_non_blanks distinct
    reshape long v, i(id) j(word)
    bysort id (word): gen count = _N
    gen is_relevant = (strpos(v, "relevant_string") > 0)
    egen count_if_relevant = total(is_relevant), by(id)
    
    . list, sepby(id) noobs abbrev(18)
    
      +-----------------------------------------------------------------------+
      | id   word                 v   count   is_relevant   count_if_relevant |
      |-----------------------------------------------------------------------|
      |  1      1   relevant_string       3             1                   2 |
      |  1      2   relevant_string       3             1                   2 |
      |  1      3    Gibberish word       3             0                   2 |
      |-----------------------------------------------------------------------|
      |  2      1    Gibberish word       3             0                   1 |
      |  2      2    Gibberish word       3             0                   1 |
      |  2      3   relevant_string       3             1                   1 |
      |-----------------------------------------------------------------------|
      |  3      1    Gibberish word       3             0                   0 |
      |  3      2    Gibberish word       3             0                   0 |
      |  3      3    Gibberish word       3             0                   0 |
      +-----------------------------------------------------------------------+
    Last edited by David Benson; 13 Feb 2019, 13:05.

    Leave a comment:


  • Nick Cox
    replied
    Your example implies that you're testing for equality:

    Code:
    gen wanted = 0
    
    quietly foreach v in V1 V2 V3 {
         replace wanted = wanted + (`v' == "relevant_string")
    }

    Leave a comment:

Working...
X