Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looping of a specific type of a variable list

    Hey Stata users!

    I am trying to loop over a list of integer variables to replace missing values. I am using the following code


    Code:
    qui: ds S* PA* PB* PC* BST* H1* H2* H3* H4* H5*, not(type string)
    local Int_vars `r(varlist)'
    
    foreach v of local Int_vars {
            egen max_var = max(`v') - 4
            replace `v' = . if max_var>`v'
            drop max_var
    }
    When I use the code above, "varlist not allowed" as an error.

    What part of my code am I getting wrong?

    Thanks!
    Last edited by Hadah Hussain; 27 Oct 2021, 09:58.

  • #2
    Code:
    egen max_var = max(`v') - 4
    is not legal syntax. The error message is somewhat misleading here, as there is not actually an illegal varlist. The problem, rather, is that -egen- does not allow general expressions to the right of the = character. Only -egen- functions are allowed. Further calculation, such as subtracting 4 is not. So you have to replace that command with two commands:
    Code:
    egen max_var = max(`v')
    replace max_var = max_var - 4
    Added: That said, the whole loop can be done more efficiently and more simply without using -egen-:

    Code:
    foreach v of local Int_vars {
        summ `v', meanonly
        replace `v' = . if `v' >  `r(max)' - 4
    }
    Last edited by Clyde Schechter; 27 Oct 2021, 10:05.

    Comment


    • #3
      By the by, that ds command as you know finds numeric variables and indeed

      Code:
      ds, has(type numeric)
      is a more direct alternative.

      If you wanted to find variables that are integers only you could use findname from the Stata Journal which allows

      Code:
      findname, all(@ == floor(@))
      where the criterion that all values are equal to the corresponding floor is necessary and sufficient to find integers.

      Naturally you can specify a varlist too.

      The history of commands here is confusing even to people who understand it. But findname is written as a superset of ds although some of the syntax is changed, which pivots on my disliking some of the syntax I had written into ds, a revision that was folded back into the official command.

      Comment


      • #4
        Thank you so much Clyde Schechter and Nick Cox.

        Comment


        • #5
          After correcting my code, I realized that I need to include an if statement inside the loop in case a variable has no missing values. If the maximum value of a variable is equal to one, then there are no missing values and I want my code to leave these variables alone. This is my new code:

          Code:
          qui: findname, all(@ == floor(@))
          local Num_vars `r(varlist)'
          
          foreach v of local Num_vars {
              summ `v', meanonly
              if `r(max)' > 1 {
                  replace `v' = . if `v' >  `r(max)' - 4
              }
              else {
              continue
              }
          }
          When I run the code, I get the following error ">1 invalid name"

          Comment


          • #6
            I cannot replicate your problem. When I run your code using the auto.dta, it proceeds without error messages and performs correctly.

            Please use the -dataex- command to show example data that exhibits the problem you are encountering.

            By the way, the -else { continue }-, though harmless, is not needed.

            Comment


            • #7
              Turns out only one of the variables is causing the problem because it had no observations. When I removed it the code ran without any issues. Thank you for your help, Clyde Schechter.

              Comment


              • #8
                Your code can be trimmed even more.


                Code:
                qui: findname, all(@ == floor(@)) local(Num_vars)  
                
                foreach v of local Num_vars {    
                    summ `v', meanonly    
                    if r(max) > 1 {        
                         replace `v' = . if `v' >  r(max) - 4    
                    }
                }
                There is no advantage and some disadvantage in calling up the local macro persona `r(max)' rather the saved result r(max).

                Indeed, this should work too:


                Code:
                qui: findname, all(@ == floor(@))    
                foreach v in `r(varlist)' {    
                    summ `v', meanonly    
                    if r(max) > 1 {        
                         replace `v' = . if `v' >  r(max) - 4    
                    }
                }
                The difference is subtle and may even seem contradictory. The foreach loop won't evaluate r(varlist) on the fly so you need the local macro persona whereas the job of replace here is to do a numeric calculation and r(max) will be understood as something to be evaluated,
                Last edited by Nick Cox; 28 Oct 2021, 12:12.

                Comment

                Working...
                X