Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate rowmeans if certain conditions are fulfilled

    Hello,

    in a dataset I have got a bunch of numeric variables that I have to calculate the arithmetic mean on, but only if other variables take a certain value. E.g. x, y should be averaged only if x_1=1, y_1=2. If the if-condition is not fufilled for a certain variable this variable should not contribute to the mean (see example below)

    x x_1 y y_1 mean
    4 1 5 2 4.5
    3 2 2 2 2
    7 1 6 1 7
    3 4 4 5 .

    Is there an easy way to code such a task? Thanks a lot for your hints in advance.

    Kind regards,

    Michael

  • #2
    I think all you need is:

    Code:
    egen mean = rowmean(x y) if x1 == 1 & y1 == 2
    Am I misunderstsanding?

    -egen- is a wonderful "Swiss Army knife" of data management tools for Stata and you should definitely get familiar with it. You'll use it nearly every time you run Stata. Similarly, if you don't know about -if- conditions, that too is must reading. -help egen- -help if-, and the linked manual sections.

    Comment


    • #3
      I agree with Clyde's advice but note that

      Code:
      gen mean = (x + y)/2 if x1 == 1 & y1 == 2 
      is an alternative.

      Comment


      • #4
        Thank you a lot for your advice. I am not completely sure whether your suggestions address my analysis problem correctly. If I understand your suggestions correctly the mean of x and y will only be calculated (and be stored as the variable mean) if both the variables x1 and x2 fulfill the condition given above. In my analysis I want to calculate the mean of lets say 4 variables a,b,c and d where a is only included in the mean if it satisfies x1==1, b is only included in the mean if x2==2, etc. Hence the if conditions only refer to one of the four variables a,b,c and d, and determine whether a respective variable enters the arithmetic mean or not. Whereas in your suggestions the if condition is used to decide whether the overall mean of a,b,c and d is calculated in the first place. Or did I get it completely wrong? Thanks for your hints in advance.

        Best,

        Michael

        Comment


        • #5
          In my analysis I want to calculate the mean of lets say 4 variables a,b,c and d where a is only included in the mean if it satisfies x1==1, b is only included in the mean if x2==2, etc. Hence the if conditions only refer to one of the four variables a,b,c and d, and determine whether a respective variable enters the arithmetic mean or not.
          Even rereading your original post a few times, I don't think it would have ever occurred to me that this is what you meant.

          So, for what you want, it's a tad more complex:

          Code:
          gen a_include = cond(x1 == 1, a, .)
          gen b_include = cond(x2 == 2, b, .)
          // PRESUMABLY SIMILAR CONDITIONS EXIST FOR c_include & d_include
          // CODE THEM ANALOGOUSLY HERE
          
          egen mean = rowmean(a_include b_include c_include d_include)
          The logic behind the code is this. a_include will be a if x == 1, and missing value otherwise. Analogously for b, c, and d. One then applies the -egen- function -rowmean()- to these new variables. In calculation the row mean, -egen- ignores missing values and calculates the mean of whichever variables are not missing. (If all are missing, then the result is also missing value.)

          Comment

          Working...
          X