Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rsum missing option for a proportion of missing?

    Apologies if this is a basic question, I haven't found what I'm looking for in help files.

    I'm creating a variable based on the sums of other variables, depending on how I want this to treat missing values I understand that I have various options such as 'gen v3 = v1+v2' (which will return missing if either var is missing) or 'egen v3 = rsum(v1 v2) (to treat missing as 0). I know that I can use the missing option to return a missing value if all of the values in the varlist are missing.

    However is there a simple solution if I wanted to return a missing if the number/proportion of missing values were over a defined threshold? e.g. if I want to sum 5 variables v1-v5 and treat missing data as 0, but return a missing value if more than 1 value or more than 2 values are missing rather than all 5 as 'egen v6 = rsum (v1 v2 v3 v4 v5), missing' will do?


  • #2
    I think the most flexible solution is to use the egen ... = rowmiss(...) command and then replace the output from egen ... = rsum(...) by missing if the number of missings exceeds the threshold you desire.

    Minor note, the official help file for egen no longer lists rsum(), only rowtotal(). I believe this does exactly what rsum() does/did, but the naming is more consistent with the Stata conventions. As in, the gen ... = sum() command calculates a cumulative/iterated sum (different for each observation), whereas rsum/rowtotal calculate the sum over the entire sample once. Depending on whether your code is ever shared with others, using rowtotal instead of rsum might increase the legibility of the code.

    Comment


    • #3
      The egen function rsum() is undocumented as of Stata 9. Since that version rowtotal() is the standard name. But they're the same function.

      Other than writing your own code, I think you need a two-step something like


      Code:
      egen total = rowtotal(v1 v2 v3 v4 v5) 
      egen miss = rowmiss(v1 v2 v3 v4 v5) 
      replace total = . if miss > 2
      for your own variable names and your own value of 2!




      Comment


      • #4
        thank you both - very much appreciated!

        Comment

        Working...
        X