Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • keep command & variable not found

    Dear Statalists,

    I need your help. First of all, I am quite new to this and still trying to figure out the basic functions, so excuse me for asking "dumb" questions.
    In order to create a panel data set out of several annual survey, I am currently trying to get rid of many variables I do not need. For my purposes, the "keep"-command seemed easier than "drop". Therefore I wrote a do-file:

    use XYZ.dta
    keep A B C D E F

    Unfortunately, not every variable is existent in every survey. As a result, I received the error "variable D not found". Can I somehow execute the keep command despite not having the variable in my dataset? Capture keep did not work (the other variables are not being dropped).

    Thanks for your help!

  • #2
    Hello Felix,

    Welcome to the Stata Forum.

    I didn't see the point of keeping or dropping variables without a given rule, say, its very name, prefix, order, values, etc.

    Since you consider yourself "quite new" in Stata, I strongly recommend you take a look at the Manual, particularly the User's Guide.

    Also, you may type - help drop - and see some nice examples.

    That said, please take in mind you may use both commands in data management, not only to deal with variables, but also observations.

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      Felix,

      capture keep does not work because if one variable in the list does not exist the whole command will fail. The capture command allows the do-file to keep running despite this error but will not cause Stata to keep the other variables (and drop the remaining ones). I don't think there is any way around this problem. As Marcos implies, if you were using drop, then you could split your drop statement into several pieces and add a capture so that even if some variables didn't exist the other drops would still work. For example, instead of

      Code:
      drop A B C D E F
      you could do:

      Code:
      capture drop A B
      capture drop C D
      capture drop E F
      In this way you can deal separately with groups of variables that appear in some surveys and not in others.

      I sympathize with the desire to use keep instead of drop because you have many variables to get rid of, but keep is probably not going to work here. Perhaps you can use variable lists and wildcards to drop large groups of variables at a time:

      Code:
      capture drop A-Z   // drop all variables between A and Z (in the order they appear in the dataset, not necessarily alphabetical order)
      capture drop A*   // drop all variables starting with A
      capture drop A?  // drop all variables that start with A and have one character after
      capture drop A?? // drop all variables that start with A and have two characters after

      Regards,
      Joe

      Comment


      • #4
        Stolen from http://stackoverflow.com/questions/1...-may-not-exist

        Below code checks whether each variable exists, and if so, adds it to a local, then proceeds to drop the list defined in that local.
        Your local masterlist is where you define the names of variables to keep.
        As you say you're quite new, this type of code should be run from a do file. Press crtl+9 to open a do editor window, paste it there, then run.

        Code:
        /* example use on auto.dta */
        sysuse auto
        
        /* keep part */
        local masterlist "A B C mpg headroom trunk weight"
        local keeplist = ""
        
        foreach i of local masterlist  {
            capture confirm variable `i'
                if !_rc {
                    local keeplist "`keeplist' `i'"
                }
        }
        
        keep `keeplist'

        Comment


        • #5
          Thank you a lot for your replies!
          Actually that code might do the trick - apart from one minor aspect. Would it be possible to fill the masterlist with another search algorithm?

          So let's say I don't want to add every variable individually to the masterlist, can I run a loop beforehand to search for variables?

          E.g.: If confirm existence A*==true add A* to masterlist.


          Best,
          Felix

          Comment


          • #6
            Something like that.
            You cant add it to the masterlist, because if it's evaluated in the confirm step multiple answers are possible.
            You can add such stuff to the initial keeplist, however, where it will simply be stored as text.

            Code:
            /* example use on auto.dta */
            sysuse auto
            /* keep part */
            local masterlist "A B C mpg"
            local keeplist = "t*"
            
            foreach i of local masterlist  {
                capture confirm variable `i'
                    if !_rc {
                        local keeplist "`keeplist' `i'"
                    }
            }
            
            keep `keeplist'
            edit: although this will still fail if no single variable starting with t exists.
            Can't really think of a quick answer that will always work
            Last edited by Jorrit Gosens; 21 Nov 2016, 08:22.

            Comment


            • #7
              A while back I wrote an isvar which is on SSC. Its job is to split a list of names into those that are variable names for the dataset in memory and those that are not. Here is how it would work on this problem:


              Code:
              . sysuse auto, clear
              (1978 Automobile Data)
              
              . isvar mpg weight frog toad
              
              variables: mpg weight
              not variables: frog toad
              
              . keep `r(varlist)'
              
              . ds
              mpg     weight
              To install, go

              Code:
              ssc install isvar

              Comment


              • #8
                Hi statalisters and Dr Cox,

                A quick question for Dr Cox related to isvar: is it possible to re-use r(badlist) stored strings to generate missing variables in the corresponding dataset?

                I am trying to loop over several excel files in a folder, convert them into .dta files, clean them and merge them.
                The cleaning part is an issue in the loop since I refer to variables that are present in some of the datasets and absent in others, but I need to keep everything. Meaning, I cannot define a minimum set of variables, or I will loose some information in some of the datasets.
                I would like to automatically generate the missing variables (maybe using r(badlist)?) so that it is not a problem in the loop.

                Is there a way to do this? I have not been able to figure it out.

                Thank you,

                Maud

                Comment


                • #9



                  I say "omitted variables" for variables that should be included in a dataset but are not. Unfortunately, it has yet another meaning, but "missing variables" is too close to "missing values", which means something quite different.

                  Yes, you can do something like that. What may bite is that you need to know which should be string and which should be numeric.

                  Code:
                  . clear
                  
                  . set obs 1 
                  number of observations (_N) was 0, now 1
                  
                  . gen foo = 42
                  
                  . 
                  . describe 
                  
                  Contains data
                    obs:             1                          
                   vars:             1                          
                   size:             4                          
                  ----------------------------------------------------------------------------------------
                                storage   display    value
                  variable name   type    format     label      variable label
                  ----------------------------------------------------------------------------------------
                  foo             float   %9.0g                 
                  ----------------------------------------------------------------------------------------
                  Sorted by: 
                       Note: Dataset has changed since last saved.
                  
                  . 
                  . isvar bar 
                  
                  not variable: bar
                  
                  . 
                  . foreach v in `r(badlist)' { 
                    2.    gen `v' = . 
                    3. } 
                  (1 missing value generated)
                  
                  . 
                  . describe 
                  
                  Contains data
                    obs:             1                          
                   vars:             2                          
                   size:             8                          
                  ----------------------------------------------------------------------------------------
                                storage   display    value
                  variable name   type    format     label      variable label
                  ----------------------------------------------------------------------------------------
                  foo             float   %9.0g                 
                  bar             float   %9.0g                 
                  ----------------------------------------------------------------------------------------
                  Sorted by: 
                       Note: Dataset has changed since last saved.

                  Comment


                  • #10
                    Great thank you very much!
                    The string versus numeric variables won't be a problem since I import excel with the allstring option.
                    I take good note of the vocabulary, thank you for your help.

                    Comment

                    Working...
                    X