Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating averages based on industry each year

    Dear Stata users,

    I have to calculate average industry values for a variable xyz per year where my data is panel data (firm level annual data)

    I get a code for calculating mean by group

    bys ind: egen ind_xyz = mean(xyz)

    But I have to calculate industry average xyz on yearly basis.

    My sample data is given below


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(firmid year) byte ind float xyz
    10006 1975 25 .0145676
    10006 1976 25 .0124178
    10006 1977 25 .0128079
    10006 1978 25 .0174595
    10006 1979 25 .0166106
    10006 1980 25 .0158416
    10006 1981 25 .0220012
    10006 1982 25 .0284683
    10006 1983 25 .0162004
    10007 1989 36 .0988984
    10010 1986 12 .1205143
    10010 1987 12  .076647
    10010 1988 12 .0505933
    10010 1989 12 .0626254
    10010 1990 12 .0190569
    10010 1991 12 .0207386
    10010 1992 12 .0387597
    10010 1993 12 .0966105
    10010 1994 12 .0894412
    10012 1987 37 .1010276
    10012 1988 37 .0664791
    10012 1989 37 .0830875
    10012 1990 37 .1175252
    10012 1991 37 .0143952
    10012 1992 37 .0079708
    10012 1993 37 .0124058
    10012 1994 37 .0182261
    10012 1995 37 .0050087
    10012 1996 37 .0177345
    10012 1997 37   .02295
    10012 1998 37 .0302348
    10012 1999 37  .008036
    10012 2000 37 .0391909
    10012 2001 37 .0288672
    10012 2002 37 .0738338
    10012 2003 37 .0986469
    10015 1983 44        0
    10015 1985 44        0
    10016 1986 12 .0411898
    10016 1987 12 .0379149
    10016 1988 12   .02786
    10016 1989 12 .0390437
    10016 1990 12 .0600927
    10016 1991 12 .0525482
    10016 1992 12 .0476832
    10016 1993 12 .1053467
    10016 1994 12 .1640635
    10016 1995 12 .0736338
    10016 1996 12 .0665109
    10016 1997 12 .0526175
    10016 1998 12 .0103638
    10016 1999 12 .0106731
    10016 2000 12 .0138657
    10017 1986 35 .1767167
    10017 1987 35 .1077084
    10019 1986 38 .0312287
    10019 1987 38 .0461489
    10019 1988 38 .0602429
    10019 1989 38   .04545
    10019 1990 38 .0831592
    10019 1991 38 .1283635
    10019 1992 38 .2123995
    10019 1993 38  .137559
    10019 1994 38 .1842091
    10019 1995 38 .1223869
    10019 1996 38 .1089716
    10019 1997 38  .098212
    10019 1998 38  .201615
    10019 1999 38 .2788843
    10019 2000 38 .4643338
    10025 1986 15   .00998
    10025 1987 15 .0106834
    10025 1988 15 .0076291
    10025 1989 15 .0038563
    10025 1990 15 .0078906
    10025 1991 15 .0059468
    10025 1992 15 .0063166
    10025 1993 15 .0033928
    10025 1994 15 .0028597
    10025 1995 15 .0064576
    10025 1996 15 .0018506
    10025 1997 15 .0095046
    10025 1998 15 .0102379
    10025 1999 15 .0067924
    10025 2000 15 .0057663
    10025 2001 15  .007679
    10025 2002 15 .0122866
    10025 2003 15 .0293506
    10025 2004 15 .0182876
    10025 2005 15 .0080633
    10025 2006 15 .0028885
    10025 2007 15 .0054951
    10025 2008 15 .0083272
    10025 2009 15 .0075315
    10025 2010 15 .0120237
    10025 2011 15 .0141514
    10025 2012 15  .005939
    10025 2013 15 .0069108
    10025 2014 15 .0085589
    10025 2015 15 .0048991
    end

    Regards,

    Shafaq

  • #2
    So, the command should start with two grouping variables, not one:

    Code:
    bys ind year: egen ind_xyz = mean(xyz)

    Comment


    • #3
      Thanks Nick,

      There is one more thing, I have to calculate firm's excess xyz over industry's average xyz (XYZi - XYZind) where industry's average excludes that particular firm's xyz value.

      Comment


      • #4
        FAQ https://www.stata.com/support/faqs/d...ng-properties/

        The mean of others is just (total - this value) / (count - 1) with some care needed for missing values.

        Alternatively, rangestat (SSC) offers a handle for this kind of problem.

        Most of your example data are singletons, so nothing can be done.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int(firmid year) byte ind float xyz
        10006 1975 25 .0145676
        10006 1976 25 .0124178
        10006 1977 25 .0128079
        10006 1978 25 .0174595
        10006 1979 25 .0166106
        10006 1980 25 .0158416
        10006 1981 25 .0220012
        10006 1982 25 .0284683
        10006 1983 25 .0162004
        10007 1989 36 .0988984
        10010 1986 12 .1205143
        10010 1987 12  .076647
        10010 1988 12 .0505933
        10010 1989 12 .0626254
        10010 1990 12 .0190569
        10010 1991 12 .0207386
        10010 1992 12 .0387597
        10010 1993 12 .0966105
        10010 1994 12 .0894412
        10012 1987 37 .1010276
        10012 1988 37 .0664791
        10012 1989 37 .0830875
        10012 1990 37 .1175252
        10012 1991 37 .0143952
        10012 1992 37 .0079708
        10012 1993 37 .0124058
        10012 1994 37 .0182261
        10012 1995 37 .0050087
        10012 1996 37 .0177345
        10012 1997 37   .02295
        10012 1998 37 .0302348
        10012 1999 37  .008036
        10012 2000 37 .0391909
        10012 2001 37 .0288672
        10012 2002 37 .0738338
        10012 2003 37 .0986469
        10015 1983 44        0
        10015 1985 44        0
        10016 1986 12 .0411898
        10016 1987 12 .0379149
        10016 1988 12   .02786
        10016 1989 12 .0390437
        10016 1990 12 .0600927
        10016 1991 12 .0525482
        10016 1992 12 .0476832
        10016 1993 12 .1053467
        10016 1994 12 .1640635
        10016 1995 12 .0736338
        10016 1996 12 .0665109
        10016 1997 12 .0526175
        10016 1998 12 .0103638
        10016 1999 12 .0106731
        10016 2000 12 .0138657
        10017 1986 35 .1767167
        10017 1987 35 .1077084
        10019 1986 38 .0312287
        10019 1987 38 .0461489
        10019 1988 38 .0602429
        10019 1989 38   .04545
        10019 1990 38 .0831592
        10019 1991 38 .1283635
        10019 1992 38 .2123995
        10019 1993 38  .137559
        10019 1994 38 .1842091
        10019 1995 38 .1223869
        10019 1996 38 .1089716
        10019 1997 38  .098212
        10019 1998 38  .201615
        10019 1999 38 .2788843
        10019 2000 38 .4643338
        10025 1986 15   .00998
        10025 1987 15 .0106834
        10025 1988 15 .0076291
        10025 1989 15 .0038563
        10025 1990 15 .0078906
        10025 1991 15 .0059468
        10025 1992 15 .0063166
        10025 1993 15 .0033928
        10025 1994 15 .0028597
        10025 1995 15 .0064576
        10025 1996 15 .0018506
        10025 1997 15 .0095046
        10025 1998 15 .0102379
        10025 1999 15 .0067924
        10025 2000 15 .0057663
        10025 2001 15  .007679
        10025 2002 15 .0122866
        10025 2003 15 .0293506
        10025 2004 15 .0182876
        10025 2005 15 .0080633
        10025 2006 15 .0028885
        10025 2007 15 .0054951
        10025 2008 15 .0083272
        10025 2009 15 .0075315
        10025 2010 15 .0120237
        10025 2011 15 .0141514
        10025 2012 15  .005939
        10025 2013 15 .0069108
        10025 2014 15 .0085589
        10025 2015 15 .0048991
        end
        
        * method 1 
        egen double total = total(xyz), by(ind year) 
        egen count = count(xyz), by(ind year) 
        gen double mean_others = (total - cond(missing(xyz), 0, xyz)) / (count - !missing(xyz)) 
        gen double excess = xyz - mean_others 
        
        * method 2 (must install -rangestat- with -ssc install rangestat-) 
        rangestat mean_others_r=xyz, int(year 0 0) by(ind) excludeself 
        gen double excess_r = xyz - mean_others_r 
        
        * check same results 
        assert excess == excess_r

        Comment

        Working...
        X