No Obs for rolling regression: Eliminate funds that had less than the 3 years of prior return history required for the estimation process.

Jessica Guo

Join Date: Feb 2015

Posts: 31
#1

No Obs for rolling regression: Eliminate funds that had less than the 3 years of prior return history required for the estimation process.

27 Feb 2015, 14:29

There are monthly mutual fund returns directly obtained from CRSP mutual fund dataset, called the raw net return.

But in the literature, researchers usually used the risk-adjusted returns in their analysis.

We need to do the rolling regression with 36months as the moving window;

at the beginning of each calendar year, for each fund, estimate the following carhart model, using fund returns for the previous 36 months:

R-Rf = a + b1*mktrf +b2*smb +b3* hml + b4*umd+error term

In stata, my code is:

qui levelsof wficn, local(ids)
foreach id of local ids {

quietly: rolling, window(36) saving(`stats', replace) nodots: reg mret_rf mktrf smb hml umd if month==1 & wficn == `id'
merge 1:1 wficn end using "`stats'", update replace
drop _merge

}

that is, we had at most 36 observations (there might be plenty of missings in returns or other variables used in the regression above during previous 36 months) to estimate each regression. How can we make sure there are enough observations to do this rolling regression? How to eliminate funds that had less than the 3 years of prior return history required for the estimation process.

But, by running the above code I got the following error message:

no observations
an error occurred when rolling executed regress
r(2000);

I also tried to use the following two sets of codes, also got error message:

1.
//1. USE rollreg
tsfill,full //neither tsfill nor (tsfill, full) works
rollreg mret_rf mktrf smb hml umd if month==1, move(36) stub(retM36)

2.
//2. USE
tsfill, full
qui rolling _b _se, window(36) saving(betas, replace) keep (yrm): reg mret_rf mktrf smb hml umd if month==1, r

Could you please help me ?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 28603
#2

27 Feb 2015, 15:22

I don't use -rolling- myself, so I don't necessarily know what requirements must be met in order to avoid your error message. But if what you need is to assure that only wifcn's with at least 36 observations are retained, you can do this:

Code:

by wifcn, sort: drop if _N < 36

If you need something more complicated, like at least 36 observations that all have complete non-missing data on a certain list of variables, or 36 consecutive observations, that, too, can be done with some additional complications to the code. But you need to be explicit about what you need. (Or perhaps somebody who is familiar with using -rolling- will understand your implicit request and will answer.)
Comment
Jessica Guo

Join Date: Feb 2015

Posts: 31
#3

02 Mar 2015, 09:31

I do need at least 36 observations before each January that all have complete non-missing data on a list of variables: mret_rf mktrf smb hml umd; since I am only running this regression each time per year, although I have monthly data. I will assign the coefficients obtained from the regression to every month in one particular year.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 28603
#4

02 Mar 2015, 10:49

So, for a given id, when you are looking at 1999, are Jan 1999 through Dec 1999 part of the 36 months you need, or is the regression based on Jan1996 through Dec1998, but the results are to be put in the observations for (every month of) 1999?
Comment
Jessica Guo

Join Date: Feb 2015

Posts: 31
#5

02 Mar 2015, 11:42

So, for a given id, the regression is based on Jan1996 through Dec1998, the results are to be put in the observations for (every month of) 1999.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 28603

02 Mar 2015, 12:32

OK. I found it difficult to use your variable names--I kept getting them confused and making typos. So I did this with some toy data. You can extract the working part of the code and then replace my variable names with yours:

Code:

// SET UP SOME TOY DATA TO TEST THIS CODE
clear*
set obs 5
gen int id = _n
expand 2
by id, sort: gen year = 1990 if _n == 1
by id: replace year = 2010 if _n == _N
xtset id year
tsfill
expand 12
by id year, sort: gen month = _n
gen date = ym(year, month)
format date %tm
xtset id date
// FILL IN A DEPNDENT VARIABLE AND FOUR INDEPENDENT
// VARIABLES WITH JUST SOME RANDOM VALUES
// SCATTER IN SOME MISSING VALUES
foreach v in dv x1 x2 x3 x4 {
 gen `v' = rnormal()
 replace `v' = . if runiform() < 0.01
}
misstable summarize

// OK, NOW THAT WE HAVE SOME TOY DATA
// HERE'S THE WORKING PART OF THE CODE:

// IDENTIFY OBSERVATIONS WITH COMPELTE DATA ON REGRESSION VARIABLES
gen complete = !missing(dv, x1, x2, x3, x4)

// AND TAKE A RUNNING SUM OF THAT
by id (date), sort: replace complete = sum(complete)

// IDENTIFY OBSERVATIONS WHERE THE PRECEDING 36
// OBSERVATIONS HAVE COMPLETE DATA
gen complete_36 = (L1.complete-L37.complete == 36)

// AND MARK A YEAR AS USABLE IF JANUARY OF THAT YEAR
// HAS 36 PRECEDING MONTHS OF COMPLETEDATA
by id year (date), sort: gen byte usable = complete_36[1]

// CREATE VARIABLES TO HOLD THE REGRESSION COEFFICIENTS
forvalues j = 1/4 {
    gen b`j' = .
}

// NOW DO THE REGRESSIONS FOR THOSE YEARS THAT ARE USABLE
levelsof id, local(ids)
levelsof year, local(years)
foreach i of local ids {
    foreach y of local years {
        display `i', `y'
        quietly summ usable if id == `i' & year == `y'
        if `r(mean)' == 1 { // DETERMINE IF THIS YEAR IS USABLE FOR THIS ID
            regress dv x1 x2 x3 x4 if inrange(year, `=`y'-3', `=`y'-1') & id == `i'
           assert e(N) == 36
           forvalues j = 1/4 {
               replace b`j' = _b[x`j'] if year == `y'
           }
      }
    }
}

NOTE: You may want to throw some more -quietly-'s into the loop to suppress some of the output.

Comment

Jessica Guo

Join Date: Feb 2015

Posts: 31
#7

02 Mar 2015, 14:26

Thank you so much!

After I changed the code with my variables, I got the following error message:
==1 invalid name
r(198);

I guess something wrong with the following code?

if `r(mean)' == 1

My code with actual variable names:

// IDENTIFY OBSERVATIONS WITH COMPELTE DATA ON REGRESSION VARIABLES
gen complete = !missing(mret_rf, mktrf, smb, hml, umd )

// AND TAKE A RUNNING SUM OF THAT
by wficn (yrm), sort: replace complete = sum(complete)

// IDENTIFY OBSERVATIONS WHERE THE PRECEDING 36
// OBSERVATIONS HAVE COMPLETE DATA
gen complete_36 = (L1.complete-L37.complete == 36)

// AND MARK A YEAR AS USABLE IF JANUARY OF THAT YEAR
// HAS 36 PRECEDING MONTHS OF COMPLETEDATA
by wficn year (yrm), sort: gen byte usable = complete_36[1]

// CREATE VARIABLES TO HOLD THE REGRESSION COEFFICIENTS
forvalues j = 1/4 {
gen b`j' = .
}

// NOW DO THE REGRESSIONS FOR THOSE YEARS THAT ARE USABLE
qui levelsof wficn, local(ids)
qui levelsof year, local(years)
g x1 = mktrf
g x2 = smb
g x3 = hml
g x4 = umd
foreach i of local ids {
foreach y of local years {
display `i', `y'
qui summ usable if wficn == `i' & year == `y'
if `r(mean)' == 1 { // DETERMINE IF THIS YEAR IS USABLE FOR THIS ID
reg mret_rf mktrf smb hml umd if inrange(year, `=`y'-3', `=`y'-1') & wficn == `i'
assert e(N) == 36
forvalues j = 1/4 {
replace b`j' = _b[x`j'] if year == `y'
}
}
}
}
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 28603
#8

02 Mar 2015, 15:35

In the toy data that I created, every combination of values of year and id actually occurs. I assumed that was true in your data as well. But if there is some combination of values of id and year that is not instantiated in your data, when we get to that combination, the -sum usable- command will not return any `r(mean)', and the statement -if `r(mean)' == 1- will become -if == 1-, which will give the error you got.

The simplest way to solve this problem is to assure that every id occurs with every year. The simplest way to do that is

Code:

xtset id yrm tsfill, full

before the code you show in your most recent post. That will expand your data set so that every id occurs with every year--and missing values of everything else if there was no such observation previously in the data set.
Comment
Jessica Guo

Join Date: Feb 2015

Posts: 31
#9

02 Mar 2015, 16:09

Hi, Clyde,
After I run the code below

tsfill, full

before I run the working codes , I still got the following error message:

==1 invalid name
r(198);

I am so confused now. Any help would be greatly appreciated.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 28603
#10

02 Mar 2015, 16:23

I don't see the problem, and it worked with my artificial data. Take the -quietly- off of the -summ usable- command, run it, and post the output leading up to the error message so I can try to figure out what's going on.

I do see one other problem you will hit when you get past this one. -replace b`j' = _b[x`j'] if year == `y'- will break, because your variables x1 through x4 are not variables in the regression. You created them with values equal to the values of mktrf, etc., but the -regress- command knows nothing about that, and it creates _b[mktrf], not _b[x1]. So you will need to use the names x1 x2 x3 x4 in the -regress- statement.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 28603
#11

02 Mar 2015, 16:38

Wait, I think I see the problem.

When you run -xtset id yrm- and then -tsfill, full-, it creates an observation for every combination of id and yrm, with all other variables (including year) set to missing if the corresponding observation did not already exist. But the code within the loop needs the corresponding values of year to be there! So, after the -tsfill, full-, put -replace year = year(dofm(yrm)) if missing(year)-, and that should fill that in. Then it should work.

If it doesn't then please do as I suggested in #10 above so I can try to figure it out.
Comment
Jessica Guo

Join Date: Feb 2015

Posts: 31
#12

11 May 2015, 10:23

Hi, Clyde, I think the codes worked but it took a very long time to execute. I actually never succeed in applying the codes for all my sample. My sample runs from 1980-2013. It takes forever to run the code. So I restrict the sample to a much shorter period, say 2005-2013. and found it worked! I am wondering if there is anyway we can do in stata to speed up the double loops! loops through all years and loop through all mutual funds.
Comment

Announcement

No Obs for rolling regression: Eliminate funds that had less than the 3 years of prior return history required for the estimation process.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment