Hello all:
I have a list of 50,000 true gene names and my dataset with 150 odd entries with some gene names misspelt in long form in the test variable. How do I tag rows corresponding to the misspelt gene names by looking up the associatedgenename variable containing the 50,000 gene names?
I played around with levelsof but the code halts at the first row or gives something faulty. In the test below, only "AiBG" should have been tagged as 1 for being incorrect (A1BG is the correct form)
levelsof test, local(testl)
levelsof associatedgenename, local(gold)
foreach v of local testl{
gen incorrect = inlist(associatedgenename, `testl')
recode incorrect (1=0) (0=1)
}
I have a list of 50,000 true gene names and my dataset with 150 odd entries with some gene names misspelt in long form in the test variable. How do I tag rows corresponding to the misspelt gene names by looking up the associatedgenename variable containing the 50,000 gene names?
I played around with levelsof but the code halts at the first row or gives something faulty. In the test below, only "AiBG" should have been tagged as 1 for being incorrect (A1BG is the correct form)
levelsof test, local(testl)
levelsof associatedgenename, local(gold)
foreach v of local testl{
gen incorrect = inlist(associatedgenename, `testl')
recode incorrect (1=0) (0=1)
}
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str22 associatedgenename str4 test "5S_rRNA" "7SK" "5_8S_rRNA" "AiBG" "7SK" "A1CF" "A1BG" "A2M" "A1BG-AS1" "7SK" "A1CF" "A1BG" "A2M" "" "A2M-AS1" "" "A2ML1" "" "A2ML1-AS1" "" "A2ML1-AS2" "" "A2MP1" "" "A3GALT2" "" "A4GALT" "" "A4GNT" "" "AA06" "" "AAAS" "" "AACS" "" "AACSP1" "" "AADAC" "" "AADACL2" "" "AADACL2-AS1" "" "AADACL3" "" "AADACL4" "" "AADACP1" "" "AADAT" "" "AAED1" "" "AAGAB" "" "AAK1" "" end
Comment