Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • foreach loop based on whether variable has certain values

    Hello,

    I am trying to create a loop for a large cross-sectional dataset that identifies string variables with the response options "Yes; No; or, Unknown", replaces these values with "1, 0, or, -1", and destrings them. Preferably I would like to replace the original string variable rather than generate duplicate numeric variables. Do I have to go through the entire dataset and identify each variable name to add to a local varlist or can I set a conditional statement and loop over all the variables?

    Thank you,
    Tom

  • #2
    I guess that by

    Code:
    "Yes; No; or, Unknown"
    you really mean that the possible string values are

    Code:
    "Yes" "No"  "Unknown"
    given that programmers have to be literal about what is literal string and what is not.

    It's certainly possible to find such variables automatically. You can write your own loop or (for example) use findname from the Stata Journal which will do it for you.

    Absent a data example from you, I made one up.

    Code:
    clear
    input x y str7 (s1 s2 s3) 
    1  98 "Yes"     "No"  "Unknown"   
    2  76 "No"      "Unknown" "Yes" 
    3  65 "Unknown" "Yes" "No" 
    end 
    
    label def whatever -1 Unknown 0 No 1 Yes 
    
    findname, all(inlist(@, "Yes", "No", "Unknown")) local(myvars) 
    
    foreach v of local myvars { 
        encode `v', gen(work) label(whatever) 
        drop `v' 
        rename work `v' 
    } 
    
    . list 
    
         +--------------------------------------+
         | x    y        s1        s2        s3 |
         |--------------------------------------|
      1. | 1   98       Yes        No   Unknown |
      2. | 2   76        No   Unknown       Yes |
      3. | 3   65   Unknown       Yes        No |
         +--------------------------------------+
    
    . 
    . list, nola 
    
         +-----------------------+
         | x    y   s1   s2   s3 |
         |-----------------------|
      1. | 1   98    1    0   -1 |
      2. | 2   76    0   -1    1 |
      3. | 3   65   -1    1    0 |
         +-----------------------+
    Notes:

    1. encode is much more natural here than destring, and I have no bias against destring. For much more discussion, see for example the review in https://www.stata-journal.com/articl...article=dm0098

    2. The all() test is stringent and will fail if you have other values. Note that the any() option offers a more lenient test.

    3. For findname,

    Code:
    . search findname, sj
    
    Search of official help files, FAQs, Examples, SJs, and STBs
    
    SJ-15-2 dm0048_3  . . . . . . . . . . . . . . . . Software update for findname
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q2/15   SJ 15(2):605--606
            updated to be able to find strL variables
    
    SJ-12-1 dm0048_2  . . . . . . . . . . . . . . . . Software update for findname
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q1/12   SJ 12(1):167
            correction for handling embedded double quote characters
    
    SJ-10-4 dm0048_1  . . . . . . . . . . . . . . . . Software update for findname
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q4/10   SJ 10(4):691
            update for not option
    
    SJ-10-2 dm0048  . . . . . . . . . . . . . .  Speaking Stata: Finding variables
            (help findname if installed)  . . . . . . . . . . . . . . .  N. J. Cox
            Q2/10   SJ 10(2):281--296
            produces a list of variable names showing which variables
            have specific properties, such as being of string type, or
            having value labels attached, or having a date format
    Use the most recent location to download while noting the longer write-up in 2010, which does not require subscription access to the Journal. Doing this in Stata gives you a clickable link.



    Comment


    • #3
      In the interest of getting to know one's data, I'd recommend tabulating each variable and making a list of the variables to which you want to apply the 1, 0, -1 definition. But, the following should do the trick. If you summarize a string variable your code will not error out - rather, it will return r(N) = 0.


      label define yesno 1 "yes" 0 "no" -1 "unknown"

      foreach x of varlist * {
      sum `x'
      if r(N) == 0 {
      replace `x' = trim(lower(`x'))
      replace `x' = "1" if `x' == "yes"
      replace `x' = "0" if `x' == "no"
      replace `x' = "-1" if `x' == "unknown"
      capture destring `x', replace

      sum `x'
      if r(N) > 0 {
      label values `x' yesno
      }
      }
      }


      Comment


      • #4
        Wow, thank you Nick. The findname command is very cool. Thank you for your help with this. I'll make sure to provide my own data next time.

        Cyrus, I'm not sure your code would be specific to variables that only had the response options "Yes" "No" and "Unknown"

        Comment


        • #5
          On #3 and #4.

          Cyrus Grout could use ds, has(type string) or findname, type(string) to select string variables.

          In principle, no observations reported by summarize might mean all numeric values missing, which would pass the test r(N) == 0 but then trigger an error.

          A feature of his code is that destring would fail if there were strings other than (after lower casing) "yes" "no" "unknown". However, if a string variable did contain other string values then it might be mangled by this code.

          Comment

          Working...
          X