Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating dummy variables from a categorical variable and using the variables values as the names

    Hi Statalist community,

    Below is a toy dataset. There is a categorical variable called foreign and I want to create dummy variables for it. Then I manually label the dummy variables with their respective variable values.

    Code:
    *Call in dataset
    clear all
    webuse auto.dta
    *Identify the variable values for the variable of interest.
    tab foreign, mi
    *Create dummy variables for each level within the categorical variable.
    tab foreign, gen(Dum_foreign)
    *Rename the dummy variables with their variable values
    rename Dum_foreign1 Domestic
    rename Dum_foreign2 Foreign
    In this toy dataset, there is also the categorical variable make which has a lot of levels. If I want to create dummy variables for the categorical variable make and then manually label the dummy variables with their respective variable values, it would take a long time. Is there a more efficient way to rename these dummy variables?

  • #2
    Well, apart from the number of values in variable make, there is the additional problem that none of them are legal variable names due to the embedded blanks. But you can approximate what you are asking for with:
    Code:
    sysuse auto, clear
    
    levelsof make, local(makes)
    local n_makes: word count `makes'
    tab make, gen(make)
    forvalues i = 1/`n_makes' {
        rename make`i' `=strtoname(`"`:word `i' of `makes''"')'
    }
    I also note that in your original code, you created the indicator ("dummy") variables using -tab foreign, mi-. As it happens, neither foreign nor make has any missing values, so this has no consequences in these examples. But if you had a variable which did have missing values, you would get an indicator variable for the missing value. But you can't name it "" or . It needs some variable name. So you would have to decide what to call that and add one more command to the code to finish the job.

    Comment


    • #3
      The make variable illustrates a problem lying just ahead, which is that many string values -- and for that matter all numeric values -- would be illegal variable names. Also, usually you only need one fewer indicator variable than there are categories, and in any case factor variable notation takes care of most such needs.

      Otherwise



      Code:
      . ssc desc dummieslab
      
      ------------------------------------------------------------------------------------
      package dummieslab from http://fmwww.bc.edu/repec/bocode/d
      ------------------------------------------------------------------------------------
      
      TITLE
            'DUMMIESLAB': module to convert categorical variable to dummy variables using
              label names
      
      DESCRIPTION/AUTHOR(S)
            
            dummieslab generates a set of dummy variables from a categorical
            variable. One dummy variable is created for each level of the
            original variable. Names for the dummy variables are derived from
            the value labels of the categorical variable.
            
            KW: categorical
            KW: indicator
            KW: dummy variables
            KW: labels
            
            Requires: Stata version 8.0
            
            Distribution-Date: 20100904
            
            Author: Philippe Van Kerm, CEPS/INSTEAD, Differdange
            Support: email [email protected]
            
            Author: Nicholas J. Cox, Durham University
            Support: email [email protected]
            
      
      INSTALLATION FILES                           (type net install dummieslab)
            dummieslab.ado
            dummieslab.hlp

      Comment


      • #4

        Clyde Schechter Subaru would be fine as a variable name, but otherwise I agree as usual on all fundamentals.

        Comment


        • #5
          You are right, Nick, of course. Odd that I should have missed that one, considering that I drive a Subaru myself!

          Comment


          • #6
            @Nick Cox and @Clyde Schechter

            Both approaches work. Thank you so much.

            Comment

            Working...
            X