ASGEN : A Stata module for weighted average mean

Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#1

ASGEN : A Stata module for weighted average mean

10 Oct 2017, 22:02

Thanks to Kit Baum, I have shared a new program asgen on SSC that computes weighted average mean. The program can be installed by

Code:

ssc install asgen

Following are related details:

Syntax

[bysort varlist]: asgen newvar = exp [if] [in] [, weights(varname) by(varlist)]

Description

asgen creates a new variable from an existing variable or an expression. The new variable contains weighted average mean. The existing variable is supplied to asgen by the =exp. The weights are based on the values of an existing numeric variable, that is specified in the weights(varname) option.

asgen accepts both the [if] [in] qualifiers for performing the required calculations on a subset of data. Further, the use of =exp can come handy in many situations where we want to make changes on the fly before we find the weighted average mean. For example, we might want to multiply the value of 100 to a variable before we find its weighted average. In such a case, the expression will look like:

Code:

asgen wmX = X * 100, w(Y)

where wmX is a new variable to be created and shall contain the weighted average; X and Y are existing variables in the data set. Options

weight(varname) is an optional option. Therefore, without this option, asgen works like egen command and finds simple mean

Example 1: Weighted average mean for kstock using the variable mvalue as a weight

Code:

webuse grunfeld asgen WM_kstock = kstock, w(mvalue)

Example 2: Weighted average mean using an expression

.

Code:

asgen WM_kstock = kstock / invest, w(mvalue)

This will divide the variable kstock on the variable invest before finding the weighted average mean.

Example 3: Avoiding the use of [if] using an expression

We use the [if] qualifier to perform calculation on a subset of the data. Using [if] condition, the resulting variable will have missing values where the condition is not true. There might be circumstance where we want to use [if] without encountering missing values in the new variable. With asgen, the use of an expression can come very handy in such a s situation. To proceed with an example, let us use astile [can be downloaded from ssc by: ssc install astile] to make five groups of firms based on the variable mvalue and five groups based on kstock.

Code:

astile size5 = mvalue, nq(5) astile kstock5 = kstock, nq(5)

Now, let's find weighted average mean in each year for a combination of firms that are in the first quantile of mvalue (size == 1) and third quantile of kstock (kstock5 == 3) using mvalue as a weighting variable

Code:

bys year: asgen WM_invest = (kstock / (size5 ==1 & kstock5==3)), w(mvalue)

Off-course, we could have done that using [if] qualifier, but that will generate missing values where the condition is not true

Code:

bys year: asgen WM_invest_IF = kstock if size5 ==1 & kstock5==3 , w(mvalue) sort id year

The difference between WM_invest and WM_invest_IF is that the former spreads the results within years, and the later does not. The use of this trick is borrowed from the behavior of egen. Further details related to this trick can be read here in Nick Cox's column.
*

Comments for improvement of asgen are greatly appreciated.

Last edited by Attaullah Shah; 10 Oct 2017, 22:06.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Tags: None

2 likes
Kate Lussy

Join Date: Apr 2019

Posts: 42
#2

28 Apr 2019, 15:02

Dear Attaullah,

Asgen has been really great additional STATA module. I was wondering though if it is possible to create equal-weighted portfolios through the use of asgen? Thank you in advance.

Best regards,
Kate
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#3

28 Apr 2019, 23:13

If you do not use the weight option, asgen then uses equal weights, just like egen function.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Kate Lussy

Join Date: Apr 2019

Posts: 42
#4

01 May 2019, 05:19

I was wondering if asgen can be used to derive the long/short portfolio average excess returns, in which I would need to explicitly spread the means because there will be missing values for the other portfolio due to the portfolio sort.

I gathered this code from other posts, as well as my own but I don't know how to actually make it work.
bys ymdate : asgen ExUSD = F_Excess_USD_w(cond(IVOL_w_5 == 1, F_Excess_USD_w, .)), w(MV_USD_w)

The F_Excess_USD_w before the parenthesis was mean, and I tried to replace it, but it doesnt work...

Would really appreciate your feedback! Thank you!
Comment
Attaullah Shah

Join Date: Aug 2014

Posts: 1669
#5

01 May 2019, 09:23

If you want to use if qualifier type statement and still do nto want have missing values where the condition is not met, then you can use this trick

Code:

bys ymdate : asgen ExUSD = (F_Excess_USD_w / (IVOL_w_5 == 1)), w(MV_USD_w)

The help file of asgen provides the following material

Example 3: Avoiding the use of [if] using an expression

We use the [if] qualifier to perform calculations on a subset of the data. Using [if] condition, the resulting variable will have missing values where the condition is not true.
There might be circumstances where we want to use [if] without encountering missing values in the new variable. The use of expression can come very handy in such as situation.
To proceed with an example, let us use astile [can be downloaded from ssc by clicking here install astile] to make five groups of firms based on mvalue and five groups based
kstock.

Code:

astile size5 = mvalue, nq(5) . astile kstock5 = kstock, nq(5)

Now, let's find weighted average mean in each year for a combination of firms that are in the first quantile of mvalue (size == 1) and third quantile of kstock (kstock5 == 3)
using mvalue as a weighting variable

Code:

. bys year: asgen WM_invest = (kstock / (size5 ==1 & kstock5==3)), w(mvalue)

Off-course, we could have done that using [if] qualifier, but that will generate missing values where the condition is not true

Code:

. bys year: asgen WM_invest_IF = kstock if size5 ==1 & kstock5==3 , w(mvalue)

The difference between WM_invest and WM_invest_IF is that the former spreads the results within years, and the later does not. The use of this trick is borrowed from the
behavior of egen. Further details related to this trick can be read here https://ideas.repec.org/a/tsj/stataj...2p305-314.html in Nick Cox's column.
*

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#6

01 May 2019, 11:04

Would this work too?

Code:

bys year : asgen WM_Invest_IF = cond(size5 == 1 & kstock5 == 3, kstock .), w(mvalue)

This and the other trick are discussed in Sections 9 and 10 of the paer cited in #5.
Comment
Kate Lussy

Join Date: Apr 2019

Posts: 42
#7

02 May 2019, 07:34

Thank you to you both! It worked perfectly. I very much appreciate your help
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36054
#8

02 May 2019, 08:05

https://journals.sagepub.com/doi/pdf...867X1101100210 takes you straight to a .pdf (public access) of the paper mentioned in #5.
Comment
Kate Lussy

Join Date: Apr 2019

Posts: 42
#9

08 May 2019, 16:37

Originally posted by Attaullah Shah View Post

If you do not use the weight option, asgen then uses equal weights, just like egen function.

I have tried this, but the result is actually the same in both cases. Would I have to add anything to the code to make it work for equally-weighted average excess returns?

bys ymdate : asgen ExUSD_1_VW_1_Size = (F_Excess_USD_w / (nq_Size_w ==1 & IVOL_Size ==1)), w(MV_USD_w)
bys ymdate : asgen ExUSD_1_VW_5_Size = (F_Excess_USD_w / (nq_Size_w ==1 & IVOL_Size ==5)), w(MV_USD_w)
bys ymdate : asgen EXUSD_1_VW_5_1_Size = ExUSD_1_VW_5 - ExUSD_1_VW_1

and

bys ymdate : asgen ExUSD_1_VW_1_Size = (F_Excess_USD_w / (nq_Size_w ==1 & IVOL_Size ==1))
bys ymdate : asgen ExUSD_1_VW_5_Size = (F_Excess_USD_w / (nq_Size_w ==1 & IVOL_Size ==5))
bys ymdate : asgen EXUSD_1_VW_5_1_Size = ExUSD_1_VW_5 - ExUSD_1_VW_1

Thank you kindly.
Comment
Kate Lussy

Join Date: Apr 2019

Posts: 42
#10

09 May 2019, 01:03

Originally posted by Kate Lussy View Post

I have tried this, but the result is actually the same in both cases. Would I have to add anything to the code to make it work for equally-weighted average excess returns?

bys ymdate : asgen ExUSD_1_VW_1_Size = (F_Excess_USD_w / (nq_Size_w ==1 & IVOL_Size ==1)), w(MV_USD_w)
bys ymdate : asgen ExUSD_1_VW_5_Size = (F_Excess_USD_w / (nq_Size_w ==1 & IVOL_Size ==5)), w(MV_USD_w)
bys ymdate : asgen EXUSD_1_VW_5_1_Size = ExUSD_1_VW_5 - ExUSD_1_VW_1

and

bys ymdate : asgen ExUSD_1_VW_1_Size = (F_Excess_USD_w / (nq_Size_w ==1 & IVOL_Size ==1))
bys ymdate : asgen ExUSD_1_VW_5_Size = (F_Excess_USD_w / (nq_Size_w ==1 & IVOL_Size ==5))
bys ymdate : asgen EXUSD_1_VW_5_1_Size = ExUSD_1_VW_5 - ExUSD_1_VW_1

Thank you kindly.

I want to add upon my post. This was also the case when I used _gwtmean.

My procedure is to:
1. Create portfolios
2. Limit each portfolio to one observation per month
3. To generate the value-weighted excess returns for each portfolio
4. To find t-statistics

I am wondering if it makes sense to move 2. to after 3.? If I do so, I then generate different values for the for equal and value weighting. I tried to look at the data to check if it is right, but I cannot make sense of it. So, it would be great to hear someone's feedback!

*Generate Future Excess Returns
tsset DSCode1 ymdate
gen F_Excess_USD_w=F.Excess_USD_w
drop if F_Excess_USD_w==.

*Univariate Portfolio Analysis
* Portfolio sorting according to IVOL
ssc install astile
astile IVOL_w_5 = IVOL_w, nq(5)

sort ymdate IVOL_w_5 DSCode1
by ymdate IVOL_w_5: drop if _n >1

* Value-weighted average excess returns of each portfolio in each month (ymdate)
bys ymdate: asgen ExUSD_1_VW = (F_Excess_USD_w / (IVOL_w_5 == 1)), w(MV_USD_w)

*Equal-weighted average excess returns of each portfolio in each month (ymdate)
bys ymdate: asgen ExUSD_1_EW = (F_Excess_USD_w / (IVOL_w_5 == 1))

sort ymdate IVOL_w_5 DSCode1
by ymdate IVOL_w_5: drop if _n >1

tsset DSCode1 ymdate
sort DSCode1 ymdate

*Value-weighted average excess returns & t-statistics of each portfolio
newey ExUSD_1_VW, lag(6) force
newey ExUSD_1_EW, lag(6) force

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input long DSCode1 float(ymdate MV_USD_w F_Excess_USD_w IVOL_w_5 ExUSD_1_EW ExUSD_1_VW)
1 559 151.45 .07484736 1 .020173784 .034087826
1 560 163.49 .0005981706 4 .006313993 .003041933
1 561 164.27 .012566957 4 .005149428 -.01948735
1 562 329.46 .019797264 3 .0251439 .0203678
1 563 337.62 .05277631 3 .03560889 .02851063
1 564 356.64 .07170374 3 -.005529334 -.03746695
1 565 383.81 .03556859 4 .016249549 .015531148
1 566 399.09 -.026278155 3 .033919852 .05264597
1 567 390.26 -.012818905 3 .011069474 -.005305402
1 568 380.41 -.012739185 3 -.009749626 -.0354712
1 569 377.02 -.04112721 3 -.00739424 -.04183678
1 570 363.43 -.04535 3 -.018767476 -.009455224
1 571 348.14 .02038562 3 -.012077605 -.005213041
1 572 356.64 -.07103478 3 .008945227 -.01197008
1 573 461.65 -.1154449 3 -.03793382 -.025115095
1 574 459.08 .0089275865 3 -.0042141736 -.04437515
1 575 464.68 -.2218975 2 -.03883289 -.09056226
1 576 362.99 .1751152 3 -.003365763 -.04764247
1 577 427.05 -.0969147 4 -.012286625 -.033523023
1 578 386.48 -.03043021 4 .017590337 .0431491
1 579 375 .0028491446 4 .013230603 .02295336
1 580 376.34 -.083706 4 -.025947643 -.045251
1 581 341.64 -.0954448 3 -.006803245 .035611454

Last edited by Kate Lussy; 09 May 2019, 02:02.
Comment
mehak basit

Join Date: Aug 2020

Posts: 4
#11

13 Sep 2020, 15:13

Hello Kate, were you able to calculate value-weighted returns? Can you help me in understanding that how do we exactly assign weights to the portfolios we created?
Comment
Sam Volpe

Join Date: Jun 2021

Posts: 36
#12

13 Jul 2021, 13:14

Dear Attaullah Shah

How does asgen deal with missing values?
Comment

Attaullah Shah

Join Date: Aug 2014
Posts: 1669

#13

13 Jul 2021, 20:36

See the following example to understand how asgen handles missing values.

Code:

webuse grunfeld
keep if company == 1

* Without missing values either in the given variable or the weight variables
asgen nomiss_mean = invest, weight(mvalue)

* Add a missing value to the given variable
replace invest = . in 1
asgen miss1_mean = invest, weight(mvalue)

* When the the weighting variable has a missing value
replace mvalue = .  in 10
asgen miss2_mean = invest, weight(mvalue)

list

     +-------------------------------------------------------------------------------------+
     | company   year   invest   mvalue   kstock   time   nomiss_~n   miss1_~n   miss2_m~n |
     |-------------------------------------------------------------------------------------|
  1. |       1   1935        .   3078.5      2.8      1   649.50206   661.7243   668.03872 |
  2. |       1   1936    391.8   4661.7     52.6      2   649.50206   661.7243   668.03872 |
  3. |       1   1937    410.6   5387.1    156.9      3   649.50206   661.7243   668.03872 |
  4. |       1   1938    257.7   2792.2    209.2      4   649.50206   661.7243   668.03872 |
  5. |       1   1939    330.8   4313.2    203.4      5   649.50206   661.7243   668.03872 |
     |-------------------------------------------------------------------------------------|
  6. |       1   1940    461.2   4643.9    207.2      6   649.50206   661.7243   668.03872 |
  7. |       1   1941      512   4551.2    255.2      7   649.50206   661.7243   668.03872 |
  8. |       1   1942      448   3244.1    303.7      8   649.50206   661.7243   668.03872 |
  9. |       1   1943    499.6   4053.7    264.1      9   649.50206   661.7243   668.03872 |
 10. |       1   1944    547.5        .    201.6     10   649.50206   661.7243   668.03872 |
     |-------------------------------------------------------------------------------------|
 11. |       1   1945    561.2   4840.9      265     11   649.50206   661.7243   668.03872 |
 12. |       1   1946    688.1   4900.9    402.2     12   649.50206   661.7243   668.03872 |
 13. |       1   1947    568.9   3526.5    761.5     13   649.50206   661.7243   668.03872 |
 14. |       1   1948    529.2   3254.7    922.4     14   649.50206   661.7243   668.03872 |
 15. |       1   1949    555.1   3700.2   1020.1     15   649.50206   661.7243   668.03872 |
     |-------------------------------------------------------------------------------------|
 16. |       1   1950    642.9   3755.6     1099     16   649.50206   661.7243   668.03872 |
 17. |       1   1951    755.9     4833   1207.7     17   649.50206   661.7243   668.03872 |
 18. |       1   1952    891.2   4924.9   1430.5     18   649.50206   661.7243   668.03872 |
 19. |       1   1953   1304.4   6241.7   1777.3     19   649.50206   661.7243   668.03872 |
 20. |       1   1954   1486.7   5593.6   2226.3     20   649.50206   661.7243   668.03872 |
     +-------------------------------------------------------------------------------------+

Findings:

1. If the variable for which mean is calculated (call it focal variable) has missing values, rows having missing values are dropped from the calculation. However, the newly generated variable reports the mean values even for observations with missing values in the focal variable, just like Stata's egen command.

2. Similarly, if the weighting variable has missing values, rows having missing values are dropped from the calculation. However, the newly generated variable reports the mean values even for observations with missing values in the weighting variable.

Last edited by Attaullah Shah; 13 Jul 2021, 20:41.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

Comment

kashif mansoor

Join Date: Jul 2020

Posts: 39
#14

07 Feb 2024, 19:36

I have a doubt on asgen command. If my weight variable is varying, how to then use the asgen command?
Comment
kashif mansoor

Join Date: Jul 2020

Posts: 39
#15

07 Feb 2024, 19:40

I have sorted it. Thank you. Your command has really come of great help, professor.
Comment

Announcement