Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to rename dummy variables based on category names?

    Hi, all.

    I have a data set with variable "town_id" indicating town id (such as 160, 155, 178, etc.).
    I want to create a set of dummy variables based on this town variable. AND I want to name the variables in the form of "town_id_x", where x is the numeric town id.

    Currently, I use tabulate town_id, gen(town_id), with which I can create all dummy variables but the names are town_id1,town_id2,town_id3...But I want the names be "town_id160, town_id155, town_id178".

    Anyone can share your knowledge? I have searched for a while but still do not know how.

    Thank you very much.


  • #2
    Since you don't tell us why you want to create dummy variables from your categorical variable rather than use Stata's "factor variable" notation, I'm going to assume you're unfamiliar with factor variable notation.

    You will find factor variable notation a powerful tool in your work. If you are not already familiar with them, do read the output of help factor variables and section 11.4.3 of the Stata User's Guide PDF included with your Stata installation and accessible from Stata's Help menu. Your effort will be amply repaid.

    Comment


    • #3
      Originally posted by William Lisowski View Post
      Since you don't tell us why you want to create dummy variables from your categorical variable rather than use Stata's "factor variable" notation, I'm going to assume you're unfamiliar with factor variable notation.

      You will find factor variable notation a powerful tool in your work. If you are not already familiar with them, do read the output of help factor variables and section 11.4.3 of the Stata User's Guide PDF included with your Stata installation and accessible from Stata's Help menu. Your effort will be amply repaid.
      Hi William, thanks!

      Comment


      • #4
        Although William Lisowski gives the best advice here, note that it can be done. Here's one way to do it.

        Code:
        clear
        input town_id 
        155
        160
        178
        end 
        
        levelsof town_id, local(towns)
        
        foreach t of local towns { 
            gen town_id_`t' = town_id == `t'
        }
        
        list
        By the way for identifiers 155, 160, 178 I am surprised to learn that they get mapped to 2, 1, 3. Order of occurrence in the dataset is immaterial with the generate() option of tabulate.

        PS: icey ge may be your real name, in which case that's fine. I read it as ICG and if that's right, then I flag the request here to use full real names, as explained at https://www.statalist.org/forums/help#realnames and again at https://www.statalist.org/forums/help#adviceextras

        Comment


        • #5
          Nick's code works fine if there are no missing values in town_id. But if there are missing values, there, it will give you zero for each of the "dummy" variables, whereas what most people want is a missing value. See the difference below:

          Code:
          clear
          input town_id
          155
          160
          178
          .
          end
          
          levelsof town_id, local(towns)
          
          foreach t of local towns {
              gen nc_town_id_`t' = town_id == `t'
          }
          
          foreach t of local towns {
              gen cs_town_id_`t' = `t'.town_id
          }
          
          list, noobs clean abbrev(16)
          Note that my approach (the cs_town_id_* variables) draws on factor-variable notation. So I will also emphasize what both William and Nick have said: using factor-variable notation is usually better than brewing your own indicator ("dummy") variables--you should create your own only if you will need them for purposes that cannot be accomplished using factor-variable notation instead. There aren't very many such purposes in Stata, and most of those are pretty exotic.

          Comment


          • #6
            Another way to go:
            Code:
            separate town_id, by(town_id)
            recode town_id?* (.=0) (*=1) if !missing(town_id)

            Comment

            Working...
            X