Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Behavior for multiple variables in simple -egen- statement, how to properly separate multiple variables?

    Dear Statalist Forum,

    I am encountering a somewhat confusing issue. Here is a simplified context of the dataset:
    -Several variables for different crops (_mean_crop1, _mean_crop2, _mean_crop3, etc)
    -By year and district in the US (year district)

    Objective of code:
    -calculate mean of all crops by year and district

    Speculated code:

    Code:
    egen agg_crop_mean = mean(_mean_crop1 _mean_crop2 _mean_crop3), by(district year)
    However, I am getting the following error:

    '_mean_crop1_mean_crop2_mean_crop3 not found r(111)'

    For multiple variables, I thought we could just list var1 var2 var3 in -egen- commands but here they are being force together? I have tried renaming them to not lead with '_' but the behavior persists.

    Would a suitable solution be:

    Code:
    egen agg_crop_mean = mean(_mean_crop1 + _mean_crop2 + _mean_crop3), by(district year)
    I thought this might alter the mean() value that is being calculated...

  • #2
    An example of your dataset can help us here. Use dataex (from SSC) to post a short example. I guess that you need something like this.
    Code:
     
     egen agg_crop_mean = mean((_mean_crop1 + _mean_crop2 + _mean_crop3)/3), by(district year)
    However, this is just a guess as I am not sure how your data is structured.
    Last edited by Attaullah Shah; 01 Oct 2018, 16:54.
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Attaullah Shah is correct. You need rowmean(). The syntax is that mean() takes an expression, but a list of two or more variable names is not an expression. The mean of the sum is not the mean of the means.
      Last edited by Nick Cox; 01 Oct 2018, 17:00.

      Comment


      • #4
        Dear Attaullah and Nick - thank you for your reply! I have implemented Attaullah's code with success. Was working with rmean() for a bit but I think I need to read more on the syntax and work with some practice datasets to fully understand how it carries itself out.

        Thanks again!

        Comment


        • #5
          Sorry, I thought that Attaullah Shah would have recommended rowmean() for the row-wise mean of three variables, but he didn't. I didn''t read his post carefully enough.

          So, this may still be open. Here is one way to think about it. Suppose you have three variables with means 1, 2, 3. Then their row-wise mean will be 2. If you want the mean of the total of three variables, that is the mean of (the total), i.e. first calculate the total; then take the mean across the group of observations in question, by default the entire dataset.

          Here it is made childishly simple

          Code:
          clear 
          input x y z 
          1 2 3 
          4 5 6 
          end 
          
          egen rowmean = rowmean(x y z) 
          egen mean = mean(x + y + z) 
          
          list 
          
           
              +----------------------------+
               | x   y   z   rowmean   mean |
               |----------------------------|
            1. | 1   2   3         2   10.5 |
            2. | 4   5   6         5   10.5 |
               +----------------------------+

          Comment


          • #6
            Hi Nick,

            Thanks for the crystal-clear example. It's my fault for not being articulate enough in the beginning - I am in fact looking for your latter solution. My _mean_cropX variables are to be seen as 'new' observations even though they come from aggregate crop measures (hence the 'mean' name). So I am looking for the mean of the total of the different variables rather than the row-wise mean. Apologies for the lack of clarity on my end!

            I've been playing around with rmean() now for a little bit and I definitely appreciate its power.

            Comment


            • #7
              OK. Then you're better off with

              Code:
              gen total = x + y + z 
              su total 
              as almost certainly the variability of the total will be of concern and you should think about it.

              Comment

              Working...
              X