Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filter variables based on middle and last characters of the var name

    I need to filter for variables scattered throughout a large-ish dataset based on a combination of characters in the middle of the var names. Although this wildcard notation (awo*b~C) is not valid, it represents conceptually what I would like to filter. For example, I need to select all the variables from all waves (the wave # is the * in "awo*b~C) for scale "b" that have been recoded as "correct" indicated by the letter "C" added to the end of the original var name. I need to repeat variations on this for many different tasks.

    This seems like a simple thing to do, but so far I see no obvious way to do this efficiently.
    • Tried various wildcards (like * and ? and ~; in the variables manager and in command lines) .
    • Search archives such as Stata Tips (like inlist and inrange).
    • Also, the rather unsystematic construction of the var names does not lend itself to the ways I normally use loop functions like foreach or forvalues
    Can someone please advise how to do this? ~~Thanks!

    ~~~Using ~ Stata14 ~ Windows~~~
    Cheers, wg
    ~ ~
    sapere aude ~~

  • #2
    Code:
     
    forval i = 0/9 { 
           capture confirm varlist awo`i'b*C 
           if _rc == 0  local mylist `mylist'  awo`i'b*C 
    }
    
    di "`mylist'"

    Comment


    • #3
      Thanks Nick. I tried this and did not get an error message. But neither did I get a display of the filtered variables or anything else). Maybe I am not understanding what to do with the "mylist " ?

      Also, where the second wildcard is located in the var name (the ~) there are differing numbers of alphanumeric char between the "b" and the "C". I thought the ~ wildcard was a placeholder for >=1 char, however so far I have not been successful in using it.
      Cheers, wg
      ~ ~
      sapere aude ~~

      Comment


      • #4
        The tilde (~) character is the same as the asterisk (*) and means 0 or more characters, but adds the restriction that only one variable name is allowed to match. The documentation is misleading, because it states that both, ~ and * mean 1 or more characters.

        I would go with rename here. Try

        Code:
        rename (awo#b*C) (foo=) , dryrun r
        return list
        local myvars `r(oldnames)'
        display "`myvars'"
        Best
        Daniel

        Comment


        • #5
          Sorry; syntax was wrong and the capture was eating the error.

          Code:
          clear 
          set obs 1 
          gen awo1bfrogC = 1 
          gen awo7btoadC = 1
          gen newt = 1 
          
          forval i = 0/9 { 
                 capture unab this : awo`i'b*C 
                 if _rc == 0  local mylist `mylist'  `this' 
          }
          
          di "`mylist'"
          
          awo1bfrogC awo7btoadC

          Comment


          • #6
            Nick , I see the logic of the syntax, and it would be a great time saver. Unfortunately, I am not able to make this work. All I can find changed in my data are the three vars (*frog *toad *newt) w all cases=1. Perhaps I just don't understand how to correctly use this approach.
            • When I run this with my data, it does display the two new var names, but after the di "`mylist'" command. But when I try to use it as a varlist name with anything other than di (e.g., describe "`mylist'") it produces nothing, and return the error "invalid name"
            • FYI -- I had t change the * to two ** since all my vars have at least 2 chars between the "b" and the "C".

            Thanks for the suggestion though - it gives me ideas that I can try for other tasks.
            Cheers, wg
            ~ ~
            sapere aude ~~

            Comment


            • #7
              The approach gives you a complete set of wildcard varlists which you then can use in some command. Using display is just to show what is produced.

              I don't understand your comment about ** at all: manifestly in my example * is quite sufficient to catch "frog" and "toad" which are multiple characters. It is not the case that ** has a different interpretation from * in wildcards
              Last edited by Nick Cox; 14 Oct 2015, 18:41.

              Comment


              • #8
                My comment about the ** vs a single * is based on my experience with Stata recently.
                • If I wanted to use a wildcard to filter for varariables named varX3name varZ5name varQ7name etc, I had to use var**name because it would not find all cases if I only used 1 * (like this var*name). It worked fine once I tried using an * for each placeholder I wanted to use. Maybe it is an anomalie, since the manual says * should represent >=1 char just like ~ . But since I could not get ~ to work, I stumbled upon using multiple **, and it did filter as I wanted.
                Much thanks. I will use this approach ALOT! ;-).
                Last edited by Wendy Garrard; 14 Oct 2015, 19:19.
                Cheers, wg
                ~ ~
                sapere aude ~~

                Comment


                • #9
                  Thanks for the appreciation: I am not trying to be stubborn, but you have not given a reproducible example where ** works and * doesn't. I certainly can't reproduce that behavour:

                  Code:
                   
                  . clear
                  
                  . set obs 1 
                  number of observations (_N) was 0, now 1
                  
                  . foreach v in varX3name varZ5name varQ7name { 
                    2.         gen `v' = 42
                    3. } 
                  
                  . 
                  . describe 
                  
                  Contains data
                    obs:             1                          
                   vars:             3                          
                   size:            12                          
                  ------------------------------------------------------------------------------
                                storage   display    value
                  variable name   type    format     label      variable label
                  ------------------------------------------------------------------------------
                  varX3name       float   %9.0g                 
                  varZ5name       float   %9.0g                 
                  varQ7name       float   %9.0g                 
                  ------------------------------------------------------------------------------
                  Sorted by: 
                       Note: Dataset has changed since last saved.
                  
                  . 
                  . d var*name 
                  
                                storage   display    value
                  variable name   type    format     label      variable label
                  ------------------------------------------------------------------------------
                  varX3name       float   %9.0g                 
                  varZ5name       float   %9.0g                 
                  varQ7name       float   %9.0g

                  Comment

                  Working...
                  X