Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • string variable input for nss() or noccur()

    Hello!

    It seems that the command
    .egen newvar = nss(myvariable ) , find (string)
    does not seem to take as input a string variable value, only a string literal

    I have data something like this (this is a toy problem)
    ID PETMET PETOWNED
    1 dog dog,dog,cat,cat,guinea pig
    2 cat dog, dog, cat, fish, snake
    And I want a variable that counts the number of occurances of petmet in petowned, preferably without reinflating the petowned variable

    I cannot do
    egen ownedandmet = nss(petowned), find(petmet) OR
    egen ownedandmet = noccur(petowned) , string (petmet)

    I need to loop through all values of the variable petmet to do this, since neither of the functions returns the desired value (it returns all zeros). Is this meant to be that way?

    levelsof petmet , local (pets)
    foreach item in pets {
    egen temp = nss(petowned) if petmet == "`item'" , find("`item'")
    replace ownedandmet = temp if petmet == "`item'"
    drop temp
    }


    I have quite a few levels of petmet (some 3000 for one case) . Is there a better way to achieve this that I haven't considered or found in my searches?

    Grateful for any suggestions
    /Barbro

  • #2
    First of all, I'd note that the the -noccur()- and -nss()- functions for -egen- require installation of the -egenmore- package, but as noted, they won't work here anyway.

    The approach that I'd think of is to remove the contents of the petmet variable from petowned, and count how much shorter the resulting string becomes. I vaguely recall Nick Cox showing an approach like this to some string problem, so the idea is not original with me.

    Code:
    replace petmet = strtrim(petmet) // clean up to be safe
    gen str ownsub = subinstr(petowned, petmet, "", .)
    gen int num_found = (strlen(petowned) - strlen(ownsub))/length(petmet)
    I'd note that this might not be a fully general solution to problems like this, as a search for something like "aa" in "aaaaa" might not return the desired count.

    Comment


    • #3
      Code:
      foreach item in pets
      is just a loop over one thing, the literal string pets -- which is not a substring of your example.

      But I agree with Mike Lacy and recommend his approach.

      Whether I was first inventor or discoverer of his technique I don't know, and it isn't important, but there is a write-up at Stata Journal | Article (stata-journal.com) -- which does flag the pitfall mentioned.

      Comment


      • #4
        Thank you, counting the lenght reduction will work for the actual problem! So clever! /Barbro

        Comment

        Working...
        X