Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filter variables inside a loop

    Hi all,

    I am trying to filter variables inside a loop cycle, I had success for a single variable but I am having trouble for 2 or more variables, example:

    This code work great for only 1 variable

    Code:
    clear all
    set more off
    sysuse auto
    
    foreach y of varlist  price - turn {
    
     if `y' == mpg  {
        di "filter 1: `y'"
    
    }
    else {
    
        di " no filter: `y'"
    
    }
    }
    
     no filter: price
    filter 1: mpg
     no filter: rep78
     no filter: headroom
     no filter: trunk
     no filter: weight
     no filter: length
     no filter: turn
    Now I am trying to filter for 2 variables, here I am having problems: I can not filter for 2 variables:

    Code:
    foreach y of varlist  price - turn {
    
     if `y' == (price | mpg)  {
        di "filter 1: `y'"
    
    }
    else {
    
        di " no filter: `y'"
    
    }
    }
     no filter: price
     no filter: mpg
     no filter: rep78
     no filter: headroom
     no filter: trunk
     no filter: weight
     no filter: length
     no filter: turn



    Thanks In advance.
    Rodrigo

  • #2
    This code confuses testing that a variable name is (is equal to) a specific string with testing the values of variables,

    Your first block of code should have used double quotes around the elements being compared but it worked by accident because only for mpg is it true that any variable’s first value is equal to mpg[1] which is how your code is interpreted.

    The second block of code is confused about what the operator | does and in any case needs inlist() to work. I can’t illustrate easily from my phone but someone else may jump in before I get back to a computer.

    Comment


    • #3
      First, even your code that "works" does not actually work. It has a bug that doesn't happen to bite in the auto.dta. Your code actually produce the "filter 1" output whenever the variable in question has the same value as mpg in the first observation of the data set. Watch:
      Code:
      . sysuse auto
      (1978 automobile data)
      
      . replace headroom = mpg in 1
      (1 real change made)
      
      .
      . foreach y of varlist  price - turn {
        2.
      .  if `y' == mpg  {
        3.     di "filter 1: `y'"
        4.
      . }
        5. else {
        6.
      .     di " no filter: `y'"
        7.
      . }
        8. }
       no filter: price
      filter 1: mpg
       no filter: rep78
      filter 1: headroom
       no filter: trunk
       no filter: weight
       no filter: length
       no filter: turn
      To get that code to work correctly, you need to put some things in quotes so that the code will treat them as variable names not as the values of the variables in the first observation:
      Code:
      . sysuse auto
      (1978 automobile data)
      
      . replace headroom = mpg in 1
      (1 real change made)
      
      .
      . foreach y of varlist  price - turn {
        2.
      .  if "`y'" == "mpg"  {
        3.     di "filter 1: `y'"
        4.
      . }
        5. else {
        6.
      .     di " no filter: `y'"
        7.
      . }
        8. }
       no filter: price
      filter 1: mpg
       no filter: rep78
       no filter: headroom
       no filter: trunk
       no filter: weight
       no filter: length
       no filter: turn
      runs correctly.

      To extend this to filter on more than one variable, you just need to get the syntax right:

      Code:
      clear all
      set more off
      sysuse auto
      replace headroom = mpg in 1
      
      foreach y of varlist  price - turn {
      
       if inlist("`y'", "price", "mpg")  {
          di "filter 1: `y'"
      
      }
      else {
      
          di " no filter: `y'"
      
      }
      }
      does this. You can extend this syntax for up to 9 "filter" variables. After that, -inlist()- cannot accept any more. If you need to use 10 or more variables, then you will have to break things up. The model for this syntax would be -if inlist("`y'", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8" "v9") | inlist("`y'", "v10", "v11" etc.)- Each inlist()- allows 10 arguments: "`y'" plus 9 others.

      Added: Crossed with #2.

      Comment


      • #4
        Thanks Nick Cox and Clyde Schechter for you reply, your code work great!

        Regards
        Rodrigo

        Comment


        • #5
          Clyde Schechter got there at about the same time and straight to the heart of the matter.

          Some more can be said.

          First, as an experienced Stata user you're familiar with stuff like

          Code:
          sysuse auto, clear 
          
          list make mpg if mpg == 41 
          
          su mpg if foreign == 1
          and you understand similarly that

          Code:
          list mpg price if foreign == turn
          would be legal as a test (setting aside a quite different point that it makes no sense for the data).

          That is the if qualifier.

          The if command does, and does not, work the same way. What is the same is that

          Code:
          if mpg == 41
          or

          Code:
          if mpg == turn
          is the way to test for (1) equality of a numeric variable and a numeric value (2) equality of two numeric variables. What is different is that when given a variable name in the if command, Stata interprets that as defaulting to the value in the first observation. If this is a surprise, you're in good company, but

          A.. What else would you expect Stata to do?

          B. If the answer to A is "reject the syntax as an error", I don't think that's a ridiculous answer, but it's not Stata's answer. For example, you are allowed to hold a constant repeatedly in a variable, and you're allowed to have a dataset with one observation. In either case, or in other cases, the rule is that

          Code:
          if mog == 41
          is interpreted as

          Code:
          if mpg[1] == 41
          while

          Code:
          if mpg == turn
          is interpreted as

          Code:
          if mpg[1] == turn[1]
          Before Stata can (try to) execute your commands, there is a syntax parser that reads your code and tries to translate it into Stata's internal code. Sometimes one wishes otherwise, but it's not part of the parser's job to guess what you mean, rather than what you say, and still less to suggest that you don't mean what you say,, or it is not a good idea.

          There is more to be said about the "or" operator. Let's go back to the if qualifier. With the auto data you can go

          Code:
          . sysuse auto, clear
          (1978 automobile data)
          
          . list mpg if mpg == 1 | 2
          and you'll get a full listing. What is going on? mpg is never equal to 1 or 2 -- which is what many users want an expression like that to mean. So why any output at all, especially output of all values?

          Stata's rules are what count. Stata parses that as

          Code:
          .... if (mpg == 1) | 2
          and looks at every observation. Now mpg == 1 is false for every observation -- so far, so good -- but 2 is not zero, hence true, by Stata's rules. That is why we need either to spell out

          Code:
          .... if mpg == 1 | mpg == 2
          or to switch to inlist() or inrange().

          Comment

          Working...
          X