Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing missing with zero in yearly dummy variables

    Hello, I am struggling to replace missing values (.) with 0 for yearly dummy variables. I have attached a part of my data below. I have 25 dummy variables so I didn't want to compute it manually. Please could you let me know if there is an easier way to compute it?


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long h_pid float c_age1 byte(c_age1_dummy1 c_age1_dummy2)
    101  . . .
    101  . . .
    101  . . .
    101  . . .
    101  . . .
    101  . . .
    101  . . .
    101  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    201  . . .
    302  . . .
    302  . . .
    302  . . .
    302  . . .
    302  . . .
    302  . . .
    302  . . .
    402  . . .
    402  . . .
    402  . . .
    402  . . .
    402  . . .
    402  . . .
    402  . . .
    402  . . .
    402  . . .
    402  . . .
    502 15 0 0
    504  . . .
    602  2 0 0
    602  3 0 0
    602  4 0 0
    602  5 0 0
    602  6 0 0
    602  7 0 0
    602  8 0 0
    602  9 0 0
    602 10 0 0
    602 11 0 0
    602 12 0 0
    602 13 0 0
    602 14 0 0
    602 15 0 0
    602 16 0 0
    602 17 0 0
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    603  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    605  . . .
    701  . . .
    701  . . .
    701  . . .
    701  . . .
    701  . . .
    701  . . .
    701  . . .
    701  . . .
    701  . . .
    end

  • #2
    Why are you computing 25 dummy variables? Or any at all? There is essentially never any need to do that in modern Stata.

    First of all, unless you are using a very old version of Stata, you can use factor-variable notation in your commands and Stata will create "virtual" dummy variables to use in the analysis on the fly, without cluttering up your data set with junk that conveys no new information. For example:
    Code:
    Instead of
    regress y x1 x2 c_age1_dummy1 c_age1_dummy2 ...
    
    Do this:
    regress y x1 x2 i.c_age1
    Stata will expand i.age to as many "dummies" as you need for the regression. See -help fvvarlist- for more information about factor-variable notation.

    If you are using a version of Stata that predates factor variable notation, or are using a recent version but have to use a command that won't accept factor variable notation, then use the -xi:- command-prefix.
    Code:
    xi: regress y x1 x2 i.c_age1

    Comment


    • #3
      Thank you for your reply.

      Would the dummies be applied to those missing values as well?

      Comment


      • #4
        What does it mean to "apply" a dummy to missing values? I don't understand the question. I can tell you that if c_age1 itself has a missing value, then so will all of the "virtual" dummies generated by i.c_age1--which is the appropriate way to handle missing values in the underlying category variable.

        Comment


        • #5
          Thank you I think I understand it now.

          However, when I run the code, I get this error: c_age1: factor variables may not contain negative values.

          What does this error mean?

          Comment


          • #6
            Code:
            tab c_age1 if c_age1 < 0
            to see whether you have any negative values, as Stata is reporting. Factor variables must be zero or positive.

            Comment

            Working...
            X