Constructing a weighted variable

Val Eggers

Join Date: Jan 2019

Posts: 7
#1

Constructing a weighted variable

23 Apr 2020, 09:26

I am trying to replicate a study by Malmendier, Ulrike, and Stefan Nagel. ("Depression babies: do macroeconomic experiences affect risk taking?." The Quarterly Journal of Economics 126.1 (2011): 373-416). In short, the paper estimates the effect of macroeconomic experiences (proxied by annual returns in S&P 500) on financial decision making, and is considered seminal in its field. To this end, I am trying to reconstruct a weighted variable, but I am having difficulties figuring out the best approach.

In their paper, the authors construct a weighted explanatory variable:

Specifically, for each household i in year t, we calculate the following weighted average of past asset returns,
[ATTACH=CONFIG]n1548741[/ATTACH]
where R_t−k is the return in year t−k. The weights (w_it) depend on the age of the household head at time t (age_it), how many years ago the return was realized (k), and a parameter λ, which controls the shape of the weighting function.

I have recreated a fake example of my two dataset:

Code:

clear all // First data set frame create annual_return frame change annual_return set seed 1405 set obs 139 gen year = _n+1870 gen ar_pct = (rnormal()*20+8)/100 dataex year ar_pct in 1/10 // Second data set frame create fr_exp_return frame change fr_exp_return set obs 10000 set seed 1405 gen year = floor(runiform()*40)+1980 sort year gen id = _n gen lambda = 1.5 gen age = floor(rnormal(0 1)*40)+40 drop if age < 25 | age > 75 dataex id year age lambda in 1/10

Notice that λ has been set to 1.5, as Malmendier et.al. estimate in their paper. The id variable corresponds to the unit index i, the year variable to t, ar_pct to R_it and age to age_it. The desired output is the weighted experienced annual return mentioned above. How would you proceed with constructing such a variable? Is there a program that can handle such operations? I am using Stata 16.1, which is running on Windows 10.

I appreciate you help, and being a novice on this forum, I welcome any criticism of my post. Thanks for your time.
Attached Files

Last edited by Val Eggers; 23 Apr 2020, 09:36.
Tags: average return, economics, macroeconomics, weight
Val Eggers

Join Date: Jan 2019

Posts: 7
#2

28 Apr 2020, 03:04

So, I realized that posting a screenshot of the equation in question might not be ideal. I then read that you could post using standard LaTeX format, so I decided to post the equation as such:
\[
A_{it}(\lambda) = \sum_{k=1}^{age_{it}-1}w_{it}(k,\lambda)R_{t-k}
\]
where
\[
w_{it}(k,\lambda)=\frac{(age_{it}-k)^\lambda}{\sum_{k=1}^{age_{it}-1}(age_{it}-k)^\lambda}
\]

I hope this inspires someone to post a reply to my question, or at least give feedback on how one can better write a question that is likely to get a response for future posts. Thanks again.

Last edited by Val Eggers; 28 Apr 2020, 03:17.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#3

28 Apr 2020, 14:50

I think that your question is fine but sometimes people are just not interested in the question, don't see an easy solution, don't have time, etc. You could have maybe provided code which shows how you have tried to create this variable. It would have be also helpful to post your example dataset and not just the instructions how to create a fake example dataset.
For example, I don't have Stata 16, so I could not recreate your example datasets nor situation that you are facing. I would have to find a way to combine both datasets together to do example calculations.
That being said, you could try to literally write out the formula in Stata in the form of several forvalues-loops. While there is a sum()-function in Stata, I am not sure how to use it in your case.
I might find the time later to post some example codes.
1 like
Comment

Val Eggers

Join Date: Jan 2019
Posts: 7

03 May 2020, 12:04

Thank you for your reply, Mr. Bormann. Your comments regarding example dataset are duly noted - I actually considered using dataex, but decided it would be too long for anyone to read the entire thing. Oh well - in either case I found the do-file that Malmendier & Nagel (2011) used. It was made and published by Stefan Nagel here: https://voices.uchicago.edu/stefannagel/code-and-data/. For good measure, I'll post the solution below, along with a dataex example. The code is somewhat cumbersome to a novice user of Stata - however, I managed to figure it out eventually. I hope you will be pleased knowing that the solution indeed involves forvalues-loops.

Code:

clear all
set memory 512m
set more off
set matsize 800

/* GENERATE DATA SET 1 - ANNUAL RETURNS ON SP500 */
input int yryear float yrret
1871  .1382
1872  .0944
1873   .031
1874  .1109
1875  .1124
1876 -.1336
1877  .1248
1878  .3473
1879  .2627
1880   .293
1881 -.0661
1882  .0568
1883  .0273
1884 -.0281
1885  .3165
1886  .1772
1887 -.0663
1888  .0334
1889  .1395
1890 -.0735
1891  .2522
1892  .0474
1893 -.1183
1894  .0991
1895  .0192
1896  .0479
1897  .2037
1898  .2742
1899 -.1077
1900  .2561
1901  .1348
1902  .0073
1903 -.1197
1904  .2594
1905  .2129
1906 -.0388
1907 -.2335
1908  .3637
1909  .0454
1910  .0501
1911  .0582
1912 -.0055
1913 -.0759
1914 -.0633
1915  .2865
1916   -.04
1917 -.3109
1918 -.0185
1919  .0447
1920 -.1617
1921  .2352
1922  .3212
1923  .0301
1924   .271
1925  .2161
1926  .1277
1927  .4027
1928   .493
1929 -.0999
1930 -.1744
1931 -.3847
1932  .0498
1933   .556
1934 -.0938
1935  .5044
1936  .3066
1937   -.34
1938  .2086
1939  .0298
1940 -.0956
1941  -.173
1942  .1166
1943  .2005
1944  .1698
1945  .3629
1946 -.2555
1947 -.0577
1948  .0633
1949  .1842
1950  .2676
1951  .1613
1952  .1746
1953 -.0154
1954  .5716
1955  .2774
1956   .033
1957 -.1185
1958  .4092
1959  .0969
1960 -.0207
1961  .2765
1962 -.1039
1963  .2105
1964  .1547
1965  .1033
1966 -.1336
1967  .2078
1968  .0603
1969 -.1396
1970 -.0187
1971  .1092
1972  .1523
1973 -.2183
1974 -.3497
1975  .2948
1976  .1844
1977 -.1357
1978 -.0239
1979  .0476
1980  .1799
1981 -.1308
1982  .1675
1983  .1863
1984  .0193
1985   .274
1986  .1777
1987   .012
1988   .117
1989  .2614
1990 -.0898
1991  .2706
1992  .0457
1993  .0722
1994 -.0145
1995   .346
1996   .191
1997  .3143
1998  .2669
1999  .1794
2000 -.1209
2001 -.1332
2002 -.2407
2003  .2635
2004  .0733
2005  .0133
2006  .1287
2007  .0134
2008 -.3728
2009  .2375
2010  .1314
2011 -.0087
2012  .1391
2013   .305
2014  .1294
2015  .0058
2016  .0966
2017  .1942
2018  -.062
2019  .2814
end, clear

mkmat yryear
mkmat yrret

local myrs = rowsof(yrret)
local rowyrs ""

forvalues i=1/`myrs'  {
   local addyr = yryear[`i',1]
   local rowyrs "`rowyrs' `addyr' "
   }

matrix rownames yrret = `rowyrs'
global yroffset = yryear[`myrs',1]-`myrs'
matrix yrs = (2007,2010,2013,2016)
global nyrs = colsof(yrs)

drop _all
/* GENERATE DATA SET 2 - EXAMPLE OF SCF DATA USED IN THIS CONTEXT*/
input int(yy1) float(year age)
 36 2007 45
 79 2007 59
137 2007 29
197 2007 59
 36 2010 53
 79 2010 64
137 2010 27
197 2010 32
 36 2013 70
 79 2013 45
137 2013 42
197 2013 27
 36 2016 59
 79 2016 40
137 2016 63
end, clear

/* CREATE WEIGHTED AVERAGE STOCK RETURNS WITH DIFFERENT WEIGHTING PARAMETERS */

local k1 1
local k2 1.433
local k3 1.325
local k4 1.166
local k5 1.50
forvalues j=1/5 {
   qui gen f`j' = 0
   qui gen w`j' = 0
   qui gen v`j' = 0
   }
quietly gen lret = .
quietly gen yri = rownumb(yrret,string(year))
forvalues i=1/79 {
     qui replace lret = yrret[yri-`i',1] if age > `i'
     forvalues j=1/5 {
       qui replace f`j' = f`j' + lret*((age-`i')/age)^`k`j'' if age > `i'
       qui replace v`j' = v`j' + (lret^2)*((age-`i')/age)^`k`j'' if age > `i'
       qui replace w`j' = w`j' + ((age-`i')/age)^`k`j'' if age > `i'
     }
}
qui gen retave1 = f1/w1
qui gen retave1433 = f2/w2
qui gen retave1325 = f3/w3
qui gen retave1166 = f4/w4
qui gen retave150 = f5/w5

drop f1 f2 f3 f4 f5
drop w1 w2 w3 w4 w5
drop v1 v2 v3 v4 v5
drop lret
drop yri

KEY TAKEAWAY
So for the general purpose of constructing variables (such as weighted averages) using two datasets of different sizes, one approach is to save one dataset in matrices, then saving the columns/rows of those matrices as global macros, and finally generating new variables in the second dataset using those macros in forvalues-loops. Notice that Nagel uses five different levels of lambda (denoted k1-k5 in the code), making the loops more complicated. Knowing experienced Stata-users, I'm sure some will find the double-loop solution inefficient, but hey, it works.

Last edited by Val Eggers; 03 May 2020, 12:13.

Comment

Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#5

04 May 2020, 06:16

I am glad that you could solve your problems.
The double-loop solution is fine for me. The code is also somewhat different from the formulas that you showed earlier.
Instead of

Code:

((age-`i')/age)^`k`j''

I would have expected

Code:

((age-`i'))^`k`j''

but it should not matter in the end.
Comment

Announcement