Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculation of mean for each industry quarter using egen, any quicker way!!

    Dear Statalist members,

    My dataset is something like this

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(firm industry qtr) float(var1 var2) byte check
     1 1 1 .55 .28 0
     2 1 1 .55 .28 0
     3 1 1 .55 .28 0
     4 1 1   .   . 1
     5 1 1 .55 .28 0
     1 1 2 .77 .63 0
     2 1 2 .77 .63 0
     3 1 2 .77 .63 0
     4 1 2 .77 .63 0
     5 1 2 .77 .63 0
     6 1 2   .   . 1
    11 2 1 .97 .13 0
    12 2 1 .97 .13 0
    13 2 1   .   . 1
    14 2 1 .97 .13 0
    15 2 1 .97 .13 0
    11 2 2 .78 .44 0
    12 2 2 .78 .44 0
    13 2 2   .   . 1
    14 2 2 .78 .44 0
    15 2 2 .78 .44 0
    end
    I want to generate mean values for each industry and quarter. I used the following code.

    gen newvar1=.
    gen newvar2=.

    forval q=1(1)2{

    forval i=1(1)2{

    egen check1= mean(var1) if `i'==industry & `q'==qtr
    egen check2= mean(var2) if `i'==industry & `q'==qtr

    replace newvar1=check1 if `i'==industry & `q'==qtr
    replace newvar2=check2 if `i'==industry & `q'==qtr

    drop check1
    drop check2
    }
    }

    BUT

    My actual dataset consists of around one million observations with many firms and quarters, so it takes too much time to complete. Maybe because of egen!!

    My Question:

    1. How can i estimate this mean in a quicker way?
    2. Another related question is that if I want to replace the missing values in var1 and var2 with previous values from same industry and quarter, how can I do that? (Missing values are the one against which the check variable is equal to 1).

    Kind Regards,
    Azhar Mughal

  • #2
    You may wish to try this:

    Code:
    bysort industry: replace var2 = var2[_N-1] if var2 ==.
    bysort industry: replace var1 = var1[_N-1] if var1 ==.
    by industry qtr, sort : egen float mymean1 = mean(var1)
    by industry qtr, sort : egen float mymean2 = mean(var2)
    Hopefully that helps
    Best regards,

    Marcos

    Comment


    • #3
      I don't quite follow Marcos' code. I guess that _n - 1 is meant rather than _N - 1.

      The imputation of missing values mentioned ignores the structure of firms as well as industries.

      The code in #1 seems a tortuous way to get at

      Code:
      egen mean1 = mean(var1) , by(industry qtr)
      
      egen mean2 = mean(var2) , by(industry qtr)
      and absolutely no loops are needed there.

      Among other possibilities, rangestat (SSC) could be faster.

      Code:
      rangestat var1 var2 , int(qtr 0 0) by(industry)
      Last edited by Nick Cox; 01 Mar 2019, 06:25.

      Comment


      • #4
        @ Nick Cox and @ Marcos Thank you for your help.Codes for the calculation of mean are working and it is absolutely correct that the code with loops were really complex. Secondly, the correction of "_n-1" also worked BUT with a little concern.

        FOR Example Consider the same data with a little change

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(industry firm qtr) float(var1 var2) byte check
        1  1 1 .55 .28 0
        1  2 1 .55 .28 0
        1  3 1 .55 .28 0
        1  4 1 .55 .28 0
        1  5 1 .55 .28 0
        1  1 2   .   . 1
        1  2 2 .77 .63 0
        1  3 2 .77 .63 0
        1  4 2 .77 .63 0
        1  5 2 .77 .63 0
        1  6 2 .77   . 1
        2 11 1 .97 .13 0
        2 12 1 .97 .13 0
        2 13 1 .97   . 1
        2 14 1 .97 .13 0
        2 15 1 .97 .13 0
        2 11 2 .78 .44 0
        2 12 2 .78 .44 0
        2 13 2 .78   . 1
        2 14 2 .78 .44 0
        2 15 2 .78 .44 0
        end



        When I applied the code, I noticed that if missing value is the first value in same industry but next quarter (as updated in the given example). That missing value is replaced by the value from previous quarter. What could be the change in the code if I want to replace the missing value from same quarter?(Condition is that the missing value must be replaced from the same industry and quarter)

        Kind Regards,
        Azhar Mughal

        Comment


        • #5
          Nick (as always) is absolutely right. Where I typed _N, I should have typed _n.
          Last edited by Marcos Almeida; 01 Mar 2019, 12:36.
          Best regards,

          Marcos

          Comment


          • #6
            You can also use asrol for getting multiple statistics for multiple variables in one go. The good news is that asrol is faster than egen. asrol can be downloaded from SSC. The following one line of code will calculate the mean for each industry and qtr for the variables var1 and var2.
            Code:
            * Install asrol
            ssc install asrol, replace
            
            * Mean of var1 and var2 by industry and qtr
            bys industry qtr: asrol var1 var2, stat(mean)
            
            list
                 +--------------------------------------------------------------------+
                 | industry   firm   qtr   var1   var2   check   mean_var1   mean_v~2 |
                 |--------------------------------------------------------------------|
              1. |        1      1     1    .55    .28       0   .55000001        .28 |
              2. |        1      2     1    .55    .28       0   .55000001        .28 |
              3. |        1      3     1    .55    .28       0   .55000001        .28 |
              4. |        1      4     1    .55    .28       0   .55000001        .28 |
              5. |        1      5     1    .55    .28       0   .55000001        .28 |
                 |--------------------------------------------------------------------|
              6. |        1      1     2      .      .       1   .76999998        .63 |
              7. |        1      2     2    .77    .63       0   .76999998        .63 |
              8. |        1      3     2    .77    .63       0   .76999998        .63 |
              9. |        1      4     2    .77    .63       0   .76999998        .63 |
             10. |        1      5     2    .77    .63       0   .76999998        .63 |
                 |--------------------------------------------------------------------|
             11. |        1      6     2    .77      .       1   .76999998        .63 |
             12. |        2     11     1    .97    .13       0   .97000003        .13 |
             13. |        2     12     1    .97    .13       0   .97000003        .13 |
             14. |        2     13     1    .97      .       1   .97000003        .13 |
             15. |        2     14     1    .97    .13       0   .97000003        .13 |
                 |--------------------------------------------------------------------|
             16. |        2     15     1    .97    .13       0   .97000003        .13 |
             17. |        2     11     2    .78    .44       0   .77999997        .44 |
             18. |        2     12     2    .78    .44       0   .77999997        .44 |
             19. |        2     13     2    .78      .       1   .77999997        .44 |
             20. |        2     14     2    .78    .44       0   .77999997        .44 |
                 |--------------------------------------------------------------------|
             21. |        2     15     2    .78    .44       0   .77999997        .44 |
                 +--------------------------------------------------------------------+
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            FinTechProfessor.com
            https://asdocx.com
            Check out my asdoc program, which sends outputs to MS Word.
            For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

            Comment


            • #7
              Marcos Almeida Attaullah Shah Thank you for your help.

              Comment

              Working...
              X