Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to sum binary variables in Stata but set the sum to missing if all variables are missing?

    Hello everyone,

    I am working with several binary variables (coded 0/1, with possible missing values) in Stata. I want to create a new variable that sums the number of 1s across a group of these indicators.

    However, I need a small adjustment:
    • If all the variables are missing (i.e., no information available), I would like the sum to be recorded as missing (.), not 0.
    • If some variables are 0 and some are 1, the sum should behave normally (counting the number of 1s).
    • If all variables are 0 (no 1s, but observed), the sum should correctly be 0.
    Here is an example of what I tried:

    gen total_strengths = (var1 == 1) + (var2 == 1) + (var3 == 1) + (var4 == 1)

    But this approach gives 0 even if all variables are missing, which is not what I want.

    Is there a clean way to sum the 1s but get missing if all input variables are missing? Ideally something efficient if I have 10+ variables.

    Thank you very much for your help!

    Best regards,


  • #2
    Code:
    egen wanted = rowtotal(var1 var2 var3 var4), missing

    Comment


    • #3
      Fire up

      Code:
      viewsource _growtotal.ado
      to see how it is done. If that functionality were not available there are other ways to do it. For example egen has another function to count missing values across rows (observations).

      Here is another way to do it that may be easier to follow.

      We initialise a sum at 0 and add to it if and only if each variable is not missing.

      We assume that all values are missing but change our mind if we find a variable that isn't missing.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(var1 var2 var3)
      1 1 1
      1 1 0
      1 0 .
      . . .
      end
      
      gen wanted = 0 
      gen allmissing = 1 
      
      foreach v of var var* {
          replace wanted = wanted + `v' if `v' < . 
          replace allmissing = 0 if `v' < . 
      }
      
      list 
      
      replace wanted = . if allmissing 
      
      list 
      
      drop allmissing
      Here is the result of the second list.

      Code:
      . list 
      
           +----------------------------------------+
           | var1   var2   var3   wanted   allmis~g |
           |----------------------------------------|
        1. |    1      1      1        3          0 |
        2. |    1      1      0        2          0 |
        3. |    1      0      .        1          0 |
        4. |    .      .      .        .          1 |
           +----------------------------------------+

      At the end we need to clean up by fixing 0 to . if all values are missing.

      Comment

      Working...
      X