Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate vs. egen for creating new variable when missing values are present

    Dear All,

    I'm trying to create a new variable in my dataset that is derived from multiple other variables.

    chadsvasc can be a score out of 9. Each of the other variables can have scores of 0 or 1, with chadsage having a score of 0,1, or 2. Chadsvasc is simply the sum of all the components, with strokeorembolism being assigned double points. Under normal circmstances, generate would work perfectly well.

    Code:
    generate chadsvasc = female + chadsage  + congestiveheartfailure + 2*strokeorembolism + diabetes + vasculardisease + hypertension
    however, when some of the component varibles have missing data, missing values are generated.

    egen would avoid this, but doesn't seem to allow more complex mathematical equations. I could easily get around this of course by doing something like:

    Code:
    generate doublestroke= 2*strokeorembolism
    then use

    Code:
    egen chadsvasc = rowtotal(female  chadsage  congestiveheartfailure doublestroke diabetes  vasculardisease  hypertension), missing
    Another solution would be to create new variables and replace the missing data with zeros to generate chadsvasc.

    But, I wanted to know if there was some way to use egen with more complex mathematical combinations when some missing values are present.

    Thanks for the input.

    Chris

    Using stata version 13

  • #2
    Well, there is no way I know of to do general algebraic expressions with -egen-. But in this particular case, you can "trick" -egen- into doing what you want:

    Code:
    egen chadsvasc = rowtotal(female chadsage congestiveheartfailure ///
        strokeorembolism strokeorembolism diabetes ///
        vasculardisease hypertension), missing
    The -rowtotal()- egen function does not check to see if there is duplication of any of the variables provided. So it will add in strokeorembolism twice. In principle this could be extended to arbitrary linear combinations with positive integer coefficients, although, in practice, the coefficients need to be small.

    Added: I did this on Stata 15. It also works on 14.2. I no longer have access to version 13, so I can't test it there. But I do notice that the current version of _growtotal is dated 2008, which is even before version 13, so I expect it will work there too.

    Comment


    • #3
      If you want missing values to be ignored use

      Code:
      cond(missing(whatever), 0, whatever)
      as a term in your sum.

      Comment


      • #4
        I was thinking along the same lines as Clyde in #2. I'll just add that if the variables are contiguous in the file (in the order you listed them), you can insert a hyphen between the first and last variables in the rowtotal() function, but then you have to tack on to the end the one you want to add twice.

        Code:
        clear all
        input byte(female chadsage chf strokeorembolism ///
        diabetes vasculardisease hypertension )
          1 1 1 1 1 1 1
          0 0 0 0 0 0 0
          1 2 1 1 1 1 1
          0 1 0 1 0 1 0
          . . . . . . .
          end
        
        egen chadsvasc = rowtotal(female-hypertension strokeorembolism)
        list
        Output:
        Code:
        . list
        
             +--------------------------------------------------------------------------------+
             | female   chadsage   chf   stroke~m   diabetes   vascul~e   hypert~n   chadsv~c |
             |--------------------------------------------------------------------------------|
          1. |      1          1     1          1          1          1          1          8 |
          2. |      0          0     0          0          0          0          0          0 |
          3. |      1          2     1          1          1          1          1          9 |
          4. |      0          1     0          1          0          1          0          4 |
          5. |      .          .     .          .          .          .          .          0 |
             +--------------------------------------------------------------------------------+
        --
        Bruce Weaver
        Email: bweaver@lakeheadu.ca
        Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
        Stata version: 16.0 IC (Windows)

        Comment


        • #5
          Many thanks for the helpful responses. Inputting the same variable twice in -rowtotal()- does work in Stata 13 as well.

          Best,

          Chris

          Comment

          Working...
          X