Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • keeping only dummy variables in datasets without manually identifying them

    Hello!

    I want to loop over loading several datasets. In each dataset, I want Stata to only keep dummy variables (without me having to manually identify their name) - in this case, these variables assume the values of 1 and 2, instead of the normal 0 and 1, but I assume that this doesn't matter. There are string variables and other numerical variables that are categorical and shares.

    Something that would keep variables that assume only two different values for all observations; or variables that have a minimum of 1 and maximum of 2, would work.

    Thanks in advance for your help.
    Best,
    Hélder


  • #2
    Without a canned program, you will have to loop through all variables and pick out these variables. Sorting and looking at the first and last values may not be ideal here, as a 1/2 variable will have the first and last values as a 1/ 1.1/ 1.5/1.7/2 variable.

    Code:
    foreach var of varlist *{
        qui levelsof `var', clean
        if r(levels)!="1 2"{
            qui drop `var'
        }
    }

    Comment


    • #3
      findname from Stata Journal can do this.

      Code:
      . sysuse auto, clear
      (1978 automobile data)
      
      . gen foo = 1 + (mpg > 40)
      
      . gen bar = runiformint(1, 2)
      
      . findname , all(inlist(@, 1, 2))
      foo  bar

      Comment


      • #4
        Thank you Andrew and Nick, both solutions worked!

        Comment


        • #5
          Now that you have identified them, know that (0, 1) indicators are much more useful than (1, 2) indicators. The way forward is to work with indicator MINUS 1 or 2 MINUS indicator depending on what makes more sense.

          Comment


          • #6
            Thank you Nick.

            Indeed but I'm loading raw data and this is how the variables are present in there.

            The way forward is to work with indicator MINUS 1 or 2 MINUS indicator depending on what makes more sense.
            By this you mean setting 1s to 0s and 2 to 1s, for example; or something else?

            Comment


            • #7
              Suppose unemployed is 1 and employed is 2. If you wanted the indicator to be 1 if employed, subtract 1. If you want it to be 1 if unemployed, substract from 2. And so on.

              Comment


              • #8
                Got it. Thank you again Nick! (and sorry for taking so long to reply).

                Comment


                • #9
                  Here is another solution with the most negligible overhead, I believe

                  Code:
                  foreach var of varlist * {
                      capture assert inlist(`var', 1, 2) , fast
                      if ( _rc ) drop `var'
                  }
                  Probably, findname implements something similar for this case.

                  Comment

                  Working...
                  X