Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Taking Average of Multiple variables for multiple observations

    my data set looks something like this:
    input str23 CITY int HOM long POP int YEAR
    "AKRON" 19 230856 1984
    "AKRON" 17 226704 1985
    "ALBUQUERQUE" 28 356366 1984
    "ALBUQUERQUE" 42 357051 1985

    "ALBUQUERQUE" 28 -9 2006

    There are multiple variables, some are nonnumeric, and each city has observations for the years 1984-2002. I want to take the averages of each individual variable for the observations of each city. For example a new value of HOM for AKRON would be the mean of all the values for AKRON HOM from all of the AKRON rows. Is there a way to do this?

  • #2
    If I understand it correctly, then you need mean values for each variable by a grouping variable CITY. You can use asrol that works with a rolling window and without a rolling window. In your case, the calculations do not need a rolling window. asrol can find multiple statistics for multiple variables in one go.

    Code:
    * Install asrol if not already installed
    ssc install asrol
    
    * Find mean values for POP HOM across CITY
    bys CITY :  asrol HOM POP, stat(mean)
    
    
    
         +-----------------------------------------------------------+
         |        CITY   HOM      POP   YEAR    mean_HOM    mean_POP |
         |-----------------------------------------------------------|
      1. |       AKRON    19   230856   1984          18      228780 |
      2. |       AKRON    17   226704   1985          18      228780 |
      3. | ALBUQUERQUE    28   356366   1984   32.666667   237802.67 |
      4. | ALBUQUERQUE    42   357051   1985   32.666667   237802.67 |
      5. | ALBUQUERQUE    28       -9   2006   32.666667   237802.67 |
         +-----------------------------------------------------------+
    And if you want to add YEAR as a grouping variable, then in the given example, each YEAR has just one observation, therefore, the average values will be the same as each observation.

    Code:
    . bys CITY YEAR:  asrol HOM POP, stat(mean)
    
    . list
    
         +---------------------------------------------------------+
         |        CITY   HOM      POP   YEAR   mean_HOM   mean_POP |
         |---------------------------------------------------------|
      1. |       AKRON    19   230856   1984         19     230856 |
      2. |       AKRON    17   226704   1985         17     226704 |
      3. | ALBUQUERQUE    28   356366   1984         28     356366 |
      4. | ALBUQUERQUE    42   357051   1985         42     357051 |
      5. | ALBUQUERQUE    28       -9   2006         28         -9 |
         +---------------------------------------------------------+
    Last edited by Attaullah Shah; 20 Apr 2018, 00:10.
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      Another way to do it is just with egen. You don't need to install anything.

      Code:
      clear 
      input str23 CITY int HOM long POP int YEAR
      "AKRON" 19 230856 1984
      "AKRON" 17 226704 1985
      "ALBUQUERQUE" 28 356366 1984
      "ALBUQUERQUE" 42 357051 1985
      "ALBUQUERQUE" 28 -9 2006
      end 
      
      foreach v in HOM POP { 
          egen mean_`v' = mean(`v'), by(CITY)
      }
      
      list, sepby(CITY) 
      
           +---------------------------------------------------------+
           |        CITY   HOM      POP   YEAR   mean_HOM   mean_POP |
           |---------------------------------------------------------|
        1. |       AKRON    19   230856   1984         18     228780 |
        2. |       AKRON    17   226704   1985         18     228780 |
           |---------------------------------------------------------|
        3. | ALBUQUERQUE    28   356366   1984   32.66667   237802.7 |
        4. | ALBUQUERQUE    42   357051   1985   32.66667   237802.7 |
        5. | ALBUQUERQUE    28       -9   2006   32.66667   237802.7 |
           +---------------------------------------------------------+
      See also tabstat

      Comment


      • #4
        But note that -9 for population of Albuquerque in 2006 is presumably some code for missing. No program will know that! Look carefully at your data before doing these kinds of calculations.

        Comment


        • #5
          Most of user-written programs are meant to make a user's life easy. These programs either eliminate the need for loops, to avoid writing several lines of codes, or save calculation time as compared to doing the same task with the Stata built-in programs. The example posted by Luis has few observations. Imagine that we have 1000 variables and we want to find mean, standard deviation, minimum, maximum, median and many more statistics for each one of them. Not only that we have to write two nested loops, the time taken by egen is more. asrol will do the same in less time and just with one line of code. Consider the following example.
          Code:
          clear
          
          set obs 1000000
          
          gen id=mod(_n,10000)+1
          
          bys id: gen year=_n+1917
          
          gen X1 = uniform()
          
          gen X2 = uniform()
          
          gen X3 = uniform()
          
          timer clear
          
          timer on 1
          bys id : asrol X1 X2 X3, stat(sd mean median count max)
          
          timer off 1
          
          local stat "sd mean median count max"
          
          timer on 2
          
          foreach v of varlist X1 X2 X3{
          
               foreach s of local stat{
          
                  bys id: egen double E`s'_`v' = `s'(`v')
          
                }
          
          }
          timer off 2
          
          timer list
          
          
          
          1: 18.80 / 1 = 18.7990
          
          2: 52.12 / 1 = 52.1240
          Regards
          --------------------------------------------------
          Attaullah Shah, PhD.
          Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
          FinTechProfessor.com
          https://asdocx.com
          Check out my asdoc program, which sends outputs to MS Word.
          For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

          Comment


          • #6
            @Attaullah Everything you say is correct, but if you want a debate on different styles here more can be said.

            0. I am positive about user-written programs and have been publicly active on that front in the Stata community since 1994. Nothing below detracts from that stance.

            1. I assume all kinds of readers here from say people who started learning Stata this week to people who have been using Stata much longer than I have, or wrote it in the first place. But I don't want new readers especially to get the impression that you need to install a user-written program to calculate group-wise summaries, for goodness' sake.

            2. Every serious discussion of time needs to factor in the time taken to work out what the code should be (usually trivial for the program authors, but more rarely trivial for anyone else).

            3. A user-written program is fragile as far as other users are concerned. In extreme cases you are dependent on one person who may or may not have been careful with their code or their documentation and may or may not decide that users should support themselves and may or may not decide to quit the scene to become a rock star, a politician or an R user.

            3. egen is slow and that is a problem. I don't care much, but StataCorp know that many people do care much and this will be attacked sooner or later. I can't tell you when because I don't know and if I did know I couldn't tell you any way.

            4. I really wouldn't over-emphasise how many lines of code something takes. I am as pleased as any user-programmer can be when a program of mine can do something in one line that other solutions take longer to produce. That success is a little empty if that was what the program was specifically designed to do. But rattling off a loop should be a skill that all intermediate and advanced users of Stata should want to acquire. I know that writing that egen loop was immensely faster for me than using asrol because I would have to read your help and understand it. (It's not, I presume, unclear; it's just that reading it does take time too.)

            5. There are other user-written programs for this purpose that can allow one-liner solutions any way. I mention rangestat (SSC) and declare an interest. I don't want to get into even a friendly fight about one user-written program versus another. I don't use asrol and I don't know enough about asrol to discuss it, especially because most of its code is hidden. Your choice and my choice, too. With any moderately versatile commands in more or less the same territory there are usually off-laps when one program can do one thing but not another and the other way round too.

            6. I have no objection to user-programmers saying "My program could be useful to you". I really could not have -- I do it all the time. But equally it is fair to say "There are other ways to do it" whenever that is true.

            Comment


            • #7
              Dear Nick
              Thanks for your comments
              @Attaullah Everything you say is correct,
              . And yes, I agree that new users should learn loops and we should give them options to use either Stata built-in programs or user-written programs. Then they should decide for themselves which one to use.
              Regards
              --------------------------------------------------
              Attaullah Shah, PhD.
              Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
              FinTechProfessor.com
              https://asdocx.com
              Check out my asdoc program, which sends outputs to MS Word.
              For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

              Comment


              • #8
                As a frequent but not sophisticated user of Stata, let me put in a word for giving beginners easily understood guidance using basic Stata. Experts often have trouble remembering how confusing Stata (or any other serious software) was when first used.

                For beginners, it is often easier to copy and modify a few generate statements than it is to do a loop or to do a simple loop rather than a more complex, elegant one. Sometimes, multiple simple logical conditions are easier for beginners to see than a more elegant solution using a function the beginner doesn't understand.

                In general, experts like to give elegant, fast solutions and I love to see the elegance, but beginners sometimes fare better with baby steps. Only by having success with baby steps do babies (and beginners) get the positive reinforcement to move to dancing. This is a matter of balance - if we never show them dancing, they may never learn to dance, but if we only show them dancing and not walking, they may become discouraged and move on to something else.

                Comment

                Working...
                X