Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Constructing a weighted variable

    I am trying to replicate a study by Malmendier, Ulrike, and Stefan Nagel. ("Depression babies: do macroeconomic experiences affect risk taking?." The Quarterly Journal of Economics 126.1 (2011): 373-416). In short, the paper estimates the effect of macroeconomic experiences (proxied by annual returns in S&P 500) on financial decision making, and is considered seminal in its field. To this end, I am trying to reconstruct a weighted variable, but I am having difficulties figuring out the best approach.

    In their paper, the authors construct a weighted explanatory variable:
    Specifically, for each household i in year t, we calculate the following weighted average of past asset returns,
    [ATTACH=CONFIG]n1548741[/ATTACH]
    where Rt−k is the return in year t−k. The weights (wit) depend on the age of the household head at time t (ageit), how many years ago the return was realized (k), and a parameter λ, which controls the shape of the weighting function.
    I have recreated a fake example of my two dataset:

    Code:
    clear all
    
    // First data set
    frame create annual_return
    frame change annual_return
    set seed 1405
    set obs 139
    gen year = _n+1870
    gen ar_pct = (rnormal()*20+8)/100
    dataex year ar_pct in 1/10
    
    // Second data set
    frame create fr_exp_return
    frame change fr_exp_return
    set obs 10000
    set seed 1405
    gen year = floor(runiform()*40)+1980
    sort year
    gen id = _n
    gen lambda = 1.5
    gen age = floor(rnormal(0 1)*40)+40
    drop if age < 25 | age > 75
    dataex id year age lambda in 1/10
    Notice that λ has been set to 1.5, as Malmendier et.al. estimate in their paper. The id variable corresponds to the unit index i, the year variable to t, ar_pct to Rit and age to ageit. The desired output is the weighted experienced annual return mentioned above. How would you proceed with constructing such a variable? Is there a program that can handle such operations? I am using Stata 16.1, which is running on Windows 10.

    I appreciate you help, and being a novice on this forum, I welcome any criticism of my post. Thanks for your time.
    Attached Files
    Last edited by Val Eggers; 23 Apr 2020, 09:36.

  • #2
    So, I realized that posting a screenshot of the equation in question might not be ideal. I then read that you could post using standard LaTeX format, so I decided to post the equation as such:
    \[
    A_{it}(\lambda) = \sum_{k=1}^{age_{it}-1}w_{it}(k,\lambda)R_{t-k}
    \]
    where
    \[
    w_{it}(k,\lambda)=\frac{(age_{it}-k)^\lambda}{\sum_{k=1}^{age_{it}-1}(age_{it}-k)^\lambda}
    \]

    I hope this inspires someone to post a reply to my question, or at least give feedback on how one can better write a question that is likely to get a response for future posts. Thanks again.
    Last edited by Val Eggers; 28 Apr 2020, 03:17.

    Comment


    • #3
      I think that your question is fine but sometimes people are just not interested in the question, don't see an easy solution, don't have time, etc. You could have maybe provided code which shows how you have tried to create this variable. It would have be also helpful to post your example dataset and not just the instructions how to create a fake example dataset.
      For example, I don't have Stata 16, so I could not recreate your example datasets nor situation that you are facing. I would have to find a way to combine both datasets together to do example calculations.
      That being said, you could try to literally write out the formula in Stata in the form of several forvalues-loops. While there is a sum()-function in Stata, I am not sure how to use it in your case.
      I might find the time later to post some example codes.

      Comment


      • #4
        Thank you for your reply, Mr. Bormann. Your comments regarding example dataset are duly noted - I actually considered using dataex, but decided it would be too long for anyone to read the entire thing. Oh well - in either case I found the do-file that Malmendier & Nagel (2011) used. It was made and published by Stefan Nagel here: https://voices.uchicago.edu/stefannagel/code-and-data/. For good measure, I'll post the solution below, along with a dataex example. The code is somewhat cumbersome to a novice user of Stata - however, I managed to figure it out eventually. I hope you will be pleased knowing that the solution indeed involves forvalues-loops.

        Code:
        clear all
        set memory 512m
        set more off
        set matsize 800
        
        /* GENERATE DATA SET 1 - ANNUAL RETURNS ON SP500 */
        input int yryear float yrret
        1871  .1382
        1872  .0944
        1873   .031
        1874  .1109
        1875  .1124
        1876 -.1336
        1877  .1248
        1878  .3473
        1879  .2627
        1880   .293
        1881 -.0661
        1882  .0568
        1883  .0273
        1884 -.0281
        1885  .3165
        1886  .1772
        1887 -.0663
        1888  .0334
        1889  .1395
        1890 -.0735
        1891  .2522
        1892  .0474
        1893 -.1183
        1894  .0991
        1895  .0192
        1896  .0479
        1897  .2037
        1898  .2742
        1899 -.1077
        1900  .2561
        1901  .1348
        1902  .0073
        1903 -.1197
        1904  .2594
        1905  .2129
        1906 -.0388
        1907 -.2335
        1908  .3637
        1909  .0454
        1910  .0501
        1911  .0582
        1912 -.0055
        1913 -.0759
        1914 -.0633
        1915  .2865
        1916   -.04
        1917 -.3109
        1918 -.0185
        1919  .0447
        1920 -.1617
        1921  .2352
        1922  .3212
        1923  .0301
        1924   .271
        1925  .2161
        1926  .1277
        1927  .4027
        1928   .493
        1929 -.0999
        1930 -.1744
        1931 -.3847
        1932  .0498
        1933   .556
        1934 -.0938
        1935  .5044
        1936  .3066
        1937   -.34
        1938  .2086
        1939  .0298
        1940 -.0956
        1941  -.173
        1942  .1166
        1943  .2005
        1944  .1698
        1945  .3629
        1946 -.2555
        1947 -.0577
        1948  .0633
        1949  .1842
        1950  .2676
        1951  .1613
        1952  .1746
        1953 -.0154
        1954  .5716
        1955  .2774
        1956   .033
        1957 -.1185
        1958  .4092
        1959  .0969
        1960 -.0207
        1961  .2765
        1962 -.1039
        1963  .2105
        1964  .1547
        1965  .1033
        1966 -.1336
        1967  .2078
        1968  .0603
        1969 -.1396
        1970 -.0187
        1971  .1092
        1972  .1523
        1973 -.2183
        1974 -.3497
        1975  .2948
        1976  .1844
        1977 -.1357
        1978 -.0239
        1979  .0476
        1980  .1799
        1981 -.1308
        1982  .1675
        1983  .1863
        1984  .0193
        1985   .274
        1986  .1777
        1987   .012
        1988   .117
        1989  .2614
        1990 -.0898
        1991  .2706
        1992  .0457
        1993  .0722
        1994 -.0145
        1995   .346
        1996   .191
        1997  .3143
        1998  .2669
        1999  .1794
        2000 -.1209
        2001 -.1332
        2002 -.2407
        2003  .2635
        2004  .0733
        2005  .0133
        2006  .1287
        2007  .0134
        2008 -.3728
        2009  .2375
        2010  .1314
        2011 -.0087
        2012  .1391
        2013   .305
        2014  .1294
        2015  .0058
        2016  .0966
        2017  .1942
        2018  -.062
        2019  .2814
        end, clear
        
        mkmat yryear
        mkmat yrret
        
        local myrs = rowsof(yrret)
        local rowyrs ""
        
        forvalues i=1/`myrs'  {
           local addyr = yryear[`i',1]
           local rowyrs "`rowyrs' `addyr' "
           }
        
        matrix rownames yrret = `rowyrs'
        global yroffset = yryear[`myrs',1]-`myrs'
        matrix yrs = (2007,2010,2013,2016)
        global nyrs = colsof(yrs)
        
        drop _all
        /* GENERATE DATA SET 2 - EXAMPLE OF SCF DATA USED IN THIS CONTEXT*/
        input int(yy1) float(year age)
         36 2007 45
         79 2007 59
        137 2007 29
        197 2007 59
         36 2010 53
         79 2010 64
        137 2010 27
        197 2010 32
         36 2013 70
         79 2013 45
        137 2013 42
        197 2013 27
         36 2016 59
         79 2016 40
        137 2016 63
        end, clear
        
        /* CREATE WEIGHTED AVERAGE STOCK RETURNS WITH DIFFERENT WEIGHTING PARAMETERS */
        
        local k1 1
        local k2 1.433
        local k3 1.325
        local k4 1.166
        local k5 1.50
        forvalues j=1/5 {
           qui gen f`j' = 0
           qui gen w`j' = 0
           qui gen v`j' = 0
           }
        quietly gen lret = .
        quietly gen yri = rownumb(yrret,string(year))
        forvalues i=1/79 {
             qui replace lret = yrret[yri-`i',1] if age > `i'
             forvalues j=1/5 {
               qui replace f`j' = f`j' + lret*((age-`i')/age)^`k`j'' if age > `i'
               qui replace v`j' = v`j' + (lret^2)*((age-`i')/age)^`k`j'' if age > `i'
               qui replace w`j' = w`j' + ((age-`i')/age)^`k`j'' if age > `i'
             }
        }
        qui gen retave1 = f1/w1
        qui gen retave1433 = f2/w2
        qui gen retave1325 = f3/w3
        qui gen retave1166 = f4/w4
        qui gen retave150 = f5/w5
        
        drop f1 f2 f3 f4 f5
        drop w1 w2 w3 w4 w5
        drop v1 v2 v3 v4 v5
        drop lret
        drop yri
        KEY TAKEAWAY
        So for the general purpose of constructing variables (such as weighted averages) using two datasets of different sizes, one approach is to save one dataset in matrices, then saving the columns/rows of those matrices as global macros, and finally generating new variables in the second dataset using those macros in forvalues-loops. Notice that Nagel uses five different levels of lambda (denoted k1-k5 in the code), making the loops more complicated. Knowing experienced Stata-users, I'm sure some will find the double-loop solution inefficient, but hey, it works.
        Last edited by Val Eggers; 03 May 2020, 12:13.

        Comment


        • #5
          I am glad that you could solve your problems.
          The double-loop solution is fine for me. The code is also somewhat different from the formulas that you showed earlier.
          Instead of
          Code:
          ((age-`i')/age)^`k`j''
          I would have expected
          Code:
          ((age-`i'))^`k`j''
          but it should not matter in the end.

          Comment

          Working...
          X