Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing var1 with a value from var2 for certain values.

    Hi,

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str27 str_name str14 str_name_other
    "Other" "xxx"
    "-oth-" "yyyy"
    "zzz" ""
    end
    I am trying to generate a variable called "name" that is a simplification of my two variables str_name and str_name_other. Basically, name would take the values of str_name unless str_name has "-oth-" or "Other" as a value, in which case it would take the value of str_name_other.

    I tried different ways of coding this but strangely the code doesn't yield the expected results i.e. nothing happens or the results are incomplete.

    Here is what I tried :

    Code:
    gen name = ""
    replace name = subinstr(str_name, "-oth-", str_name_other, .)
    replace name = subinstr(str_name, "Other", str_name_other, .)
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str27 str_name str14 str_name_other str5 name
    "Other" "xxx"  "xxx"  
    "-oth-" "yyyy" "-oth-"
    "zzz"   ""     "zzz"  
    end

    Code:
    local other = strpos(str_name, "Other")
    local other2 = strpos(str_name, "-oth-")
    gen name = str_name
    
    if `other' > 0 | `other2' > 0 {
    replace name = str_name_other
    
    }
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str27 str_name str14 str_name_other str5 name
    "Other" "xxx"  "xxx"
    "-oth-" "yyyy" "yyyy"
    "zzz"   ""     ""    
    end
    Could anyone help me find what is wrong with my code?

    EDIT : By the way, I suspect my dataset to have names containing the word "other" that are NOT to be replaced (a firm name containing the word "another", for instance). Could anyone give me a solution that is available for cells containing exclusively the keywords "Other", "other", "-oth-", etc. and nothing else?
    Last edited by Julia Simon; 30 Jan 2023, 04:04.

  • #2
    The problem with the first block of code is misunderstanding what subinstr() does. You don't need it, as there is a more direct route. Indeed the problem is one of replacing entire strings some of the time, not modifying existing strings in detail.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str27 str_name str14 str_name_other
    "Other" "xxx"
    "-oth-" "yyyy"
    "zzz" ""
    end
    
    gen wanted = cond(inlist(str_name, "Other", "-oth-"), str_name_other, str_name) 
    
    list 
    
         +------------------------------+
         | str_name   str_na~r   wanted |
         |------------------------------|
      1. |    Other        xxx      xxx |
      2. |    -oth-       yyyy     yyyy |
      3. |      zzz                 zzz |
         +------------------------------+
    Your second block of code is going in the wrong direction. A local macro assignment like

    Code:
     
     local other = strpos(str_name, "Other")
    will in practice be interpreted as
    Code:
    local other = strpos(str_name[1], "Other")
    and so pushing values into a local macro -- which is sometimes needed -- would have to be coupled with a loop over observations. Similarly the
    if command (as opposed to the if qualifier) does not imply a loop over observations. This is explained (in my view a little backwards) in an FAQ https://www.stata.com/support/faqs/programming/if-command-versus-if-qualifier/
    Clyde Schechter and I have written a piece on this, but it won't be out (in the Stata Journal) until about June.

    Comment

    Working...
    X