Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a varlist of sequential dummy variables

    Hi All,

    Assume I have two or more nominal variables, and I want to generate individual categorical (dummy) variables that account for the two original variables sequentially. So in the example data below, there are two original nominal data variables (shape and color), which have 3 and 4 unique levels, respectively. Thus, I would like the new dummy variables to be sequentially named (v1 - v7).

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 shape str6 color
    "square"     "blue"  
    "round"      "blue"  
    "round"      "red"   
    "round"      "red"   
    "round"      "blue"  
    "round"      "green"
    "square"     "green"
    "square"     "green"
    "round"      "red"   
    "round"      "blue"  
    "triangular" "yellow"
    "triangular" "red"   
    end
    Using tab , gen() on the first variable creates the correct dummies and sequence, however, this will not work when implementing tab, gen() on the second (or thereafter) nominal variable(s)

    Code:
    local varlist shape color
    foreach v of local varlist {
        tabulate `v', generate(r) nofreq
    }
    The only thing I can think of is keeping track of the count of dummies generated r(r) and have a forvalues loop that adds to that in the second (and thereafter) variables. But that would still be problematic because I'd have to rename the variable first to have them consistent (e.g. r) and only then add on the numeric sequencing value...

    Thanks in advance!

    Ariel

  • #2
    You could reshape long all variables and then generate the dummies, then reshape wide. Otherwise, you have to go to #2.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 shape str6 color
    "square"     "blue"  
    "round"      "blue"  
    "round"      "red"  
    "round"      "red"  
    "round"      "blue"  
    "round"      "green"
    "square"     "green"
    "square"     "green"
    "round"      "red"  
    "round"      "blue"  
    "triangular" "yellow"
    "triangular" "red"  
    end
    
    local varlist shape color
    local count 1
    foreach v of local varlist {
        tabulate `v', generate(_qvar) nofreq
        rename (_qvar*) (v#), addnumber(`count')
        local count = `count'+ `r(r)'
    }
    Res.:

    Code:
    . l, sep(0)
    
         +--------------------------------------------------------+
         |      shape    color   v1   v2   v3   v4   v5   v6   v7 |
         |--------------------------------------------------------|
      1. |     square     blue    0    1    0    1    0    0    0 |
      2. |      round     blue    1    0    0    1    0    0    0 |
      3. |      round      red    1    0    0    0    0    1    0 |
      4. |      round      red    1    0    0    0    0    1    0 |
      5. |      round     blue    1    0    0    1    0    0    0 |
      6. |      round    green    1    0    0    0    1    0    0 |
      7. |     square    green    0    1    0    0    1    0    0 |
      8. |     square    green    0    1    0    0    1    0    0 |
      9. |      round      red    1    0    0    0    0    1    0 |
     10. |      round     blue    1    0    0    1    0    0    0 |
     11. | triangular   yellow    0    0    1    0    0    0    1 |
     12. | triangular      red    0    0    1    0    0    1    0 |
         +--------------------------------------------------------+
    
    .
    Last edited by Andrew Musau; 12 Jul 2023, 13:27.

    Comment


    • #3
      Thank you, Andrew!

      I originally tried reshaping long on the two original variables (after renaming them r1 and r2) and then reshaping wide, but that did not produce the correct layout for the dummy variables... Your approach here is very helpful!

      Ariel

      Comment

      Working...
      X