Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate weighted means for each year and company

    Dear fellow Stata users,

    I want to calculate weigthed means for companies comprised of monthly values for one year. In my data set, however, their are mutiple observations for each company and different companies.

    I am looking for a command that will sum up the variable sellpct_sum for which I have monthly values at the moment into a yearly total value for that company. In a second step, I need to sum up the number of recommendations from the 12 months for each company to a yearly value. How do I do this best?

    I know that in a last step I need to divide the two values but I don't know how to do the two steps I have described in between.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 cusip str6(gvkey gvkeyx) int(from thru) str28 co_conm str6 co_tic str9 co_cusip str4 co_sic str6(ticker oftic) long statpers double(stdev numrec sellpct) float cyear byte _merge float(yfrom insamplefrom esample1 sellpct_sum)
    "02906610" "014565" "000003" 14762 17211 "AMERICAN POWER CONVERSION CP" "APCC." "029066107" "3620" "APCC" "APCC" 15539  .71  8     0 2002 3 2000 1 1      0
    "91290910" "023978" "000003" 11443 19905 "UNITED STATES STEEL CORP"     "X"     "912909108" "3312" "X"    "X"    15413  1.2  9 11.11 2002 3 1991 1 1  99.99
    "92532F10" "024344" "000003" 19624     . "VERTEX PHARMACEUTICALS INC"   "VRTX"  "92532F100" "2834" "VRT1" "VRTX" 15413   .6 11     0 2002 3 2013 0 0      0
    "75281A10" "006788" "000003" 17521     . "RANGE RESOURCES CORP"         "RRC"   "75281A109" "1311" "LOMK" "RRC"  15385    0  2     0 2002 3 2007 0 0      0
    "87966410" "010420" "000003" 12967 18981 "TELLABS INC"                  "TLAB"  "879664100" "3661" "TLAB" "TLAB" 15630  .23 19  5.26 2002 3 1995 1 1  99.94
    "67034610" "008030" "000003"  9251     . "NUCOR CORP"                   "NUE"   "670346105" "3312" "NUE"  "NUE"  15693  .89 12  8.33 2002 3 1985 1 1  99.96
    "45990210" "006097" "000003" 15222 19896 "INTL GAME TECHNOLOGY"         "IGT.1" "459902102" "7990" "IGAM" "IGT"  15693  .96 15 13.33 2002 3 2001 1 1 199.95
    "47836610" "006268" "000003"  9647     . "JOHNSON CONTROLS INC"         "JCI"   "478366107" "2531" "JCI"  "JCI"  15385  .72 16     0 2002 3 1986 1 1      0
    "17290810" "003062" "000003" 15035     . "CINTAS CORP"                  "CTAS"  "172908105" "2320" "CTAS" "CTAS" 15357  .82 10     0 2002 3 2001 1 1      0
    "61536910" "139665" "000003" 14886     . "MOODY'S CORP"                 "MCO"   "615369105" "7323" "MDY"  "MCO"  15357    1  5     0 2002 3 2000 1 1      0
    "11704310" "002444" "000003"  4838 17705 "BRUNSWICK CORP"               "BC"    "117043109" "3510" "BC"   "BC"   15658  .76  8     0 2002 3 1973 1 1      0
    "31996310" "025157" "000003" 12688 17433 "FIRST DATA CORP"              "FDC"   "319963104" "7374" "FDC1" "FDC"  15448  .74 24     0 2002 3 1994 1 1      0
    "21683110" "003502" "000003" 11078 16999 "COOPER TIRE & RUBBER CO"      "CTB"   "216831107" "3011" "CTB"  "CTB"  15630  .84  6     0 2002 3 1990 1 1      0
    "02906610" "014565" "000003" 14762 17211 "AMERICAN POWER CONVERSION CP" "APCC." "029066107" "3620" "APCC" "APCC" 15511  .71  8     0 2002 3 2000 1 1      0
    "16090310" "016751" "000003" 14780 16314 "CHARTER ONE FINANCIAL INC"    "CF.6"  "160903100" "6020" "COFI" "CF"   15448  .73 17     0 2002 3 2000 1 1      0
    "98970110" "011687" "000003" 15151     . "ZIONS BANCORPORATION"         "ZION"  "989701107" "6020" "ZION" "ZION" 15476   .9 13     0 2002 3 2001 1 1      0
    "65129010" "029173" "000003" 18616     . "NEWFIELD EXPLORATION CO"      "NFX"   "651290108" "1311" "NFX"  "NFX"  15567  .69 25     0 2002 3 2010 0 0      0
    "29476L10" "028733" "000003" 15312     . "EQUITY RESIDENTIAL"           "EQR"   "29476L107" "6798" "EQR"  "EQR"  15448  .81 16  6.25 2002 3 2001 1 1    100
    "30231G10" "004503" "000003"  1551     . "EXXON MOBIL CORP"             "XOM"   "30231G102" "2911" "XON"  "XOM"  15448  .88 24  4.17 2002 3 1964 1 1 100.08
    "01741R10" "010405" "000003"  6939 20270 "ALLEGHENY TECHNOLOGIES INC"   "ATI"   "01741R102" "3350" "ALS1" "ATI"  15476 1.18 10    10 2002 3 1978 1 1    100
    end
    format %td from
    format %tdnn/dd/CCYY thru
    format %d statpers
    format %ty cyear
    format %ty yfrom
    label values _merge _merge
    label def _merge 3 "matched (3)", modify
    Thanks in advance!

    Viktoria

  • #2
    Thanks for the data example, but for anyone not using this kind of data it's hard to hack through to work out exactly what you're talking about. This is not a forum just for economists and people who work with business or finance data! (Listing people by number of posts, for example, I have it that only 1 of the 10 most frequent posters on Statalist is (obviously) an economist and I think even he [Carlo Lazzaro] works with different data.)

    First, on "company" I see these variables that look like identifiers

    Code:
    cusip gvkey gvkeyx co_conm co_tic co_cusip co_sic ticker oftic
    Which, singly or jointly, define "company"?

    Second, on "month" and "year" I see these variables as indicating time. Which indicates the month and year for calculations?

    Code:
    . l from thru statpers cyear yfrom sellpct
    
         +--------------------------------------------------------------+
         |      from         thru    statpers   cyear   yfrom   sellpct |
         |--------------------------------------------------------------|
      1. | 01jun2000    2/14/2007   18jul2002    2002    2000         0 |
      2. | 01may1991     7/1/2014   14mar2002    2002    1991     11.11 |
      3. | 23sep2013            .   14mar2002    2002    2013         0 |
      4. | 21dec2007            .   14feb2002    2002    2007         0 |
      5. | 03jul1995   12/20/2011   17oct2002    2002    1995      5.26 |
         |--------------------------------------------------------------|
      6. | 30apr1985            .   19dec2002    2002    1985      8.33 |
      7. | 04sep2001    6/22/2014   19dec2002    2002    2001     13.33 |
      8. | 31may1986            .   14feb2002    2002    1986         0 |
      9. | 01mar2001            .   17jan2002    2002    2001         0 |
     10. | 03oct2000            .   17jan2002    2002    2000         0 |
         |--------------------------------------------------------------|
     11. | 31mar1973    6/22/2008   14nov2002    2002    1973         0 |
     12. | 27sep1994    9/24/2007   18apr2002    2002    1994         0 |
     13. | 01may1990    7/17/2006   17oct2002    2002    1990         0 |
     14. | 01jun2000    2/14/2007   20jun2002    2002    2000         0 |
     15. | 19jun2000    8/31/2004   18apr2002    2002    2000         0 |
         |--------------------------------------------------------------|
     16. | 25jun2001            .   16may2002    2002    2001         0 |
     17. | 20dec2010            .   15aug2002    2002    2010         0 |
     18. | 03dec2001            .   18apr2002    2002    2001      6.25 |
     19. | 31mar1964            .   18apr2002    2002    1964      4.17 |
     20. | 31dec1978     7/1/2015   16may2002    2002    1978        10 |
         +--------------------------------------------------------------+
    When it comes to the details, know that weighted means are in essence three steps. You know everything except that egen, total() is your friend.

    Code:
     
    bysort <whatever> : egen num = total(weight * x) 
    by <whatever>: egen den = total(weight) 
    gen wtmean = num/den

    Comment


    • #3
      Nick is right:
      I've never worked with data like the ones posted by Viktoria (and I do not think I will have the chance to in the future).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X