Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing across variables if observation equals a certain value

    Hi!

    I have the following data set (example seen below) - it contains 66 variables: 8 of these are h_m* (where h_m1 would be the total number of hours in month 1, h_m2 is the total number of hours in month 2 and so on until month 8), 29 are tags d_* (which go from d_1 to d_29) and the remaining 29 variables are number of hours, visit_duration* (these also go from visit_duration1 to visit_duration29).

    Code:
    input float(h_m1 h_m2 h_m8) double visit_duration1 float d_1 double visit_duration3 float d_3 double visit_duration10 float d_10
    . . . 1.5 1 . .   . .
    . . .   . . . .   . .
    . . .   5 1 5 2 5.5 5
    . . .   1 1 4 1 4.5 3
    . . .   3 1 2 5   4 4
    . . .   8 1 5 2   3 4
    . . .   1 1 3 1   3 4
    . . . 2.5 1 4 4   4 3
    end
    I want to sum the total number of hours by month (h_m*) using the 29 visit_duration* variables if the tag's value is equal to the month number. For example, for h_m1 (i.e. month 1) I want to sum the number of hours from any of the visit_duration* variables if its corresponding tag value, d_* is equal to the month number

    I have tried using the following code,

    Code:
     forvalues i=1/8 {
            forvalues h=1/29 {
                replace h_m`i' = h_m`i' + visit_duration`h' if d_`h'==`i'
          }
     }
    but I am not getting the results I want, when I run it all I see is:

    Code:
    0 real changes made

    I tried using egen before, but from what I saw on this forum, it wasn't a good idea at all. Any suggestions are welcome. I apologize if the post is confusing, it is hard to explain something in English when it isn't my first language. I would be happy to try and explain more if it wasn't clear. Thanks!

  • #2
    Wecome to Statalist.

    Thank you for the well-presented code and example data. The example data was particularly important, because the code you show looks fine.

    I think the reason your code fails is because you initialized the values of your h_m variables to missing rather than zero. When you add a number to a missing value, the result remains missing. And for most analyses you will want the total to be 0 for months where there were no visits, because Stata will tend to omit observations with missing values for any of the variables.

    In the code below, I started with your example data but changed the visit numbers to 1, 2, and 3 so I could write a nice forvalues loop. I think it confirms that your code is right, but you need to change the initialization of your total.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double visdur1 float d_1 double visdur2 float d_2 double visdur3 float d_3
    1.5 1 . .   . .
      . . . .   . .
      5 1 5 2 5.5 5
      1 1 4 1 4.5 3
      3 1 2 5   4 4
      8 1 5 2   3 4
      1 1 3 1   3 4
    2.5 1 4 4   4 3
    end
    
    forvalues month=1/8 {
        generate h_m`month' = 0
        forvalues visit=1/3 {
            replace h_m`month' = h_m`month' + visdur`visit' if d_`visit' == `month'
            }
        }
    
    list visdur1-d_3, clean
    list h_m*, clean
    Code:
    . list visdur1-d_3, clean
    
           visdur1   d_1   visdur2   d_2   visdur3   d_3  
      1.       1.5     1         .     .         .     .  
      2.         .     .         .     .         .     .  
      3.         5     1         5     2       5.5     5  
      4.         1     1         4     1       4.5     3  
      5.         3     1         2     5         4     4  
      6.         8     1         5     2         3     4  
      7.         1     1         3     1         3     4  
      8.       2.5     1         4     4         4     3  
    
    . list h_m*, clean
    
           h_m1   h_m2   h_m3   h_m4   h_m5   h_m6   h_m7   h_m8  
      1.    1.5      0      0      0      0      0      0      0  
      2.      0      0      0      0      0      0      0      0  
      3.      5      5      0      0    5.5      0      0      0  
      4.      5      0    4.5      0      0      0      0      0  
      5.      3      0      0      4      2      0      0      0  
      6.      8      5      0      3      0      0      0      0  
      7.      4      0      0      3      0      0      0      0  
      8.    2.5      0      4      4      0      0      0      0

    Comment


    • #3
      even is a good guess. You could count the number of instances with anycount() and then multiply by the value sought. You still need one loop.

      Comment

      Working...
      X