Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Report variables that have "abnormal" values

    Hello all,

    For data cleaning purposes, I'm trying to find a quick and efficient way to list variables that do not meet a certain condition. For instance, if I'm expecting a group of variable (who have the same name structure for instance) to have a range going from 1 to 5, I would be interested to know all the variables that have at least one observation superior to 5, or out of the range from 1 to 5. Or if I'm expecting a group of variable to only have "Yes" or "No" or "I don't know" as an answer, I would like to know all the variables that have another option than these three. That way, I can store the variables in a macro and make adjustments for each of these variables in a loop using this macro.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 A1 str4 A2 str2 B1 str1 B2
    "Yes" "Yes"  "2"  "3"
    "No"  "Blue" "30" "2"
    "No"  "Yes"  "2"  "5"
    "No"  "No"   "7"  "1"
    end
    For instance here, if I ask Stata to report all variables among Bs that have at least one observation different than 1, 2, 3, 4, or 5 (yes, in String, because I also want to check before destringing it if there is any nonnumeric value too), I expect it to return B1. And if I ask Stata to report all variables among As that have at least one observation different than "Yes" or "No", I expect it to return A2. Is there any command or function that can do something like this ?

    Thanks a lot for your help.

  • #2
    findname from the Stata Journal can help. Here is code for your two questions.

    Code:
    clear
    input str3 A1 str4 A2 str2 B1 str1 B2
    "Yes" "Yes"  "2"  "3"
    "No"  "Blue" "30" "2"
    "No"  "Yes"  "2"  "5"
    "No"  "No"   "7"  "1"
    end
    
    findname B?, any(!inlist(@, "1", "2", "3", "4", "5"))
    
    findname A?, any(!inlist(@, "Yes", "No"))
    findname is functionally a superset of official command ds. It can do more (that's the superset part) and I think the syntax is better. (In saying that I am criticising my own previous work.) Any way, the option used here has no equivalent in ds. That said, the code concerned is just a loop over variables, and could be matched by (e.g.)

    Code:
    local badlist 
    
    foreach v of var A? { 
        qui count if !inlist(`v', "Yes", "No") 
        if r(N) > 0 local badlist `badlist' `v' 
    } 
    
    di "`badlist'"
    As is often the case with community-contributed commands, there was a write-up a while back that has been followed by various updates. so download from the latest public source but look through the original paper if you care.

    Code:
    . search findname, sj
    
    Search of official help files, FAQs, Examples, and Stata Journals
    
    SJ-20-2 dm0048_4  . . . . . . . . . . . . . . . . Software update for findname
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q2/20   SJ 20(2):504
            new options include columns()
    
    SJ-15-2 dm0048_3  . . . . . . . . . . . . . . . . Software update for findname
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q2/15   SJ 15(2):605--606
            updated to be able to find strL variables
    
    SJ-12-1 dm0048_2  . . . . . . . . . . . . . . . . Software update for findname
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q1/12   SJ 12(1):167
            correction for handling embedded double quote characters
    
    SJ-10-4 dm0048_1  . . . . . . . . . . . . . . . . Software update for findname
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q4/10   SJ 10(4):691
            update for not option
    
    SJ-10-2 dm0048  . . . . . . . . . . . . . .  Speaking Stata: Finding variables
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q2/10   SJ 10(2):281--296
            produces a list of variable names showing which variables
            have specific properties, such as being of string type, or
            having value labels attached, or having a date format
    
    (end of search)

    Comment


    • #3
      Nick : Thank you for your help. Your comments are very useful as always

      Comment

      Working...
      X