Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • No Obs for rolling regression: Eliminate funds that had less than the 3 years of prior return history required for the estimation process.

    There are monthly mutual fund returns directly obtained from CRSP mutual fund dataset, called the raw net return.

    But in the literature, researchers usually used the risk-adjusted returns in their analysis.

    We need to do the rolling regression with 36months as the moving window;

    at the beginning of each calendar year, for each fund, estimate the following carhart model, using fund returns for the previous 36 months:


    R-Rf = a + b1*mktrf +b2*smb +b3* hml + b4*umd+error term

    In stata, my code is:

    qui levelsof wficn, local(ids)
    foreach id of local ids {


    quietly: rolling, window(36) saving(`stats', replace) nodots: reg mret_rf mktrf smb hml umd if month==1 & wficn == `id'
    merge 1:1 wficn end using "`stats'", update replace
    drop _merge

    }

    that is, we had at most 36 observations (there might be plenty of missings in returns or other variables used in the regression above during previous 36 months) to estimate each regression. How can we make sure there are enough observations to do this rolling regression? How to eliminate funds that had less than the 3 years of prior return history required for the estimation process.

    But, by running the above code I got the following error message:


    no observations
    an error occurred when rolling executed regress
    r(2000);



    I also tried to use the following two sets of codes, also got error message:

    1.
    //1. USE rollreg
    tsfill,full //neither tsfill nor (tsfill, full) works
    rollreg mret_rf mktrf smb hml umd if month==1, move(36) stub(retM36)

    2.
    //2. USE
    tsfill, full
    qui rolling _b _se, window(36) saving(betas, replace) keep (yrm): reg mret_rf mktrf smb hml umd if month==1, r



    Could you please help me ?



  • #2
    I don't use -rolling- myself, so I don't necessarily know what requirements must be met in order to avoid your error message. But if what you need is to assure that only wifcn's with at least 36 observations are retained, you can do this:

    Code:
    by wifcn, sort: drop if _N < 36
    If you need something more complicated, like at least 36 observations that all have complete non-missing data on a certain list of variables, or 36 consecutive observations, that, too, can be done with some additional complications to the code. But you need to be explicit about what you need. (Or perhaps somebody who is familiar with using -rolling- will understand your implicit request and will answer.)

    Comment


    • #3
      I do need at least 36 observations before each January that all have complete non-missing data on a list of variables: mret_rf mktrf smb hml umd; since I am only running this regression each time per year, although I have monthly data. I will assign the coefficients obtained from the regression to every month in one particular year.

      Comment


      • #4
        So, for a given id, when you are looking at 1999, are Jan 1999 through Dec 1999 part of the 36 months you need, or is the regression based on Jan1996 through Dec1998, but the results are to be put in the observations for (every month of) 1999?

        Comment


        • #5
          So, for a given id, the regression is based on Jan1996 through Dec1998, the results are to be put in the observations for (every month of) 1999.

          Comment


          • #6
            OK. I found it difficult to use your variable names--I kept getting them confused and making typos. So I did this with some toy data. You can extract the working part of the code and then replace my variable names with yours:

            Code:
            // SET UP SOME TOY DATA TO TEST THIS CODE
            clear*
            set obs 5
            gen int id = _n
            expand 2
            by id, sort: gen year = 1990 if _n == 1
            by id: replace year = 2010 if _n == _N
            xtset id year
            tsfill
            expand 12
            by id year, sort: gen month = _n
            gen date = ym(year, month)
            format date %tm
            xtset id date
            // FILL IN A DEPNDENT VARIABLE AND FOUR INDEPENDENT
            // VARIABLES WITH JUST SOME RANDOM VALUES
            // SCATTER IN SOME MISSING VALUES
            foreach v in dv x1 x2 x3 x4 {
             gen `v' = rnormal()
             replace `v' = . if runiform() < 0.01
            }
            misstable summarize
            
            // OK, NOW THAT WE HAVE SOME TOY DATA
            // HERE'S THE WORKING PART OF THE CODE:
            
            // IDENTIFY OBSERVATIONS WITH COMPELTE DATA ON REGRESSION VARIABLES
            gen complete = !missing(dv, x1, x2, x3, x4)
            
            // AND TAKE A RUNNING SUM OF THAT
            by id (date), sort: replace complete = sum(complete)
            
            // IDENTIFY OBSERVATIONS WHERE THE PRECEDING 36
            // OBSERVATIONS HAVE COMPLETE DATA
            gen complete_36 = (L1.complete-L37.complete == 36)
            
            // AND MARK A YEAR AS USABLE IF JANUARY OF THAT YEAR
            // HAS 36 PRECEDING MONTHS OF COMPLETEDATA
            by id year (date), sort: gen byte usable = complete_36[1]
            
            // CREATE VARIABLES TO HOLD THE REGRESSION COEFFICIENTS
            forvalues j = 1/4 {
                gen b`j' = .
            }
            
            // NOW DO THE REGRESSIONS FOR THOSE YEARS THAT ARE USABLE
            levelsof id, local(ids)
            levelsof year, local(years)
            foreach i of local ids {
                foreach y of local years {
                    display `i', `y'
                    quietly summ usable if id == `i' & year == `y'
                    if `r(mean)' == 1 { // DETERMINE IF THIS YEAR IS USABLE FOR THIS ID
                        regress dv x1 x2 x3 x4 if inrange(year, `=`y'-3', `=`y'-1') & id == `i'
                       assert e(N) == 36
                       forvalues j = 1/4 {
                           replace b`j' = _b[x`j'] if year == `y'
                       }
                  }
                }
            }
            NOTE: You may want to throw some more -quietly-'s into the loop to suppress some of the output.

            Comment


            • #7
              Thank you so much!

              After I changed the code with my variables, I got the following error message:
              ==1 invalid name
              r(198);

              I guess something wrong with the following code?

              if `r(mean)' == 1



              My code with actual variable names:

              // IDENTIFY OBSERVATIONS WITH COMPELTE DATA ON REGRESSION VARIABLES
              gen complete = !missing(mret_rf, mktrf, smb, hml, umd )

              // AND TAKE A RUNNING SUM OF THAT
              by wficn (yrm), sort: replace complete = sum(complete)

              // IDENTIFY OBSERVATIONS WHERE THE PRECEDING 36
              // OBSERVATIONS HAVE COMPLETE DATA
              gen complete_36 = (L1.complete-L37.complete == 36)

              // AND MARK A YEAR AS USABLE IF JANUARY OF THAT YEAR
              // HAS 36 PRECEDING MONTHS OF COMPLETEDATA
              by wficn year (yrm), sort: gen byte usable = complete_36[1]

              // CREATE VARIABLES TO HOLD THE REGRESSION COEFFICIENTS
              forvalues j = 1/4 {
              gen b`j' = .
              }

              // NOW DO THE REGRESSIONS FOR THOSE YEARS THAT ARE USABLE
              qui levelsof wficn, local(ids)
              qui levelsof year, local(years)
              g x1 = mktrf
              g x2 = smb
              g x3 = hml
              g x4 = umd
              foreach i of local ids {
              foreach y of local years {
              display `i', `y'
              qui summ usable if wficn == `i' & year == `y'
              if `r(mean)' == 1 { // DETERMINE IF THIS YEAR IS USABLE FOR THIS ID
              reg mret_rf mktrf smb hml umd if inrange(year, `=`y'-3', `=`y'-1') & wficn == `i'
              assert e(N) == 36
              forvalues j = 1/4 {
              replace b`j' = _b[x`j'] if year == `y'
              }
              }
              }
              }




              Comment


              • #8
                In the toy data that I created, every combination of values of year and id actually occurs. I assumed that was true in your data as well. But if there is some combination of values of id and year that is not instantiated in your data, when we get to that combination, the -sum usable- command will not return any `r(mean)', and the statement -if `r(mean)' == 1- will become -if == 1-, which will give the error you got.

                The simplest way to solve this problem is to assure that every id occurs with every year. The simplest way to do that is

                Code:
                xtset id yrm
                tsfill, full
                before the code you show in your most recent post. That will expand your data set so that every id occurs with every year--and missing values of everything else if there was no such observation previously in the data set.

                Comment


                • #9
                  Hi, Clyde,
                  After I run the code below

                  tsfill, full

                  before I run the working codes , I still got the following error message:

                  ==1 invalid name
                  r(198);

                  I am so confused now. Any help would be greatly appreciated.

                  Comment


                  • #10
                    I don't see the problem, and it worked with my artificial data. Take the -quietly- off of the -summ usable- command, run it, and post the output leading up to the error message so I can try to figure out what's going on.

                    I do see one other problem you will hit when you get past this one. -replace b`j' = _b[x`j'] if year == `y'- will break, because your variables x1 through x4 are not variables in the regression. You created them with values equal to the values of mktrf, etc., but the -regress- command knows nothing about that, and it creates _b[mktrf], not _b[x1]. So you will need to use the names x1 x2 x3 x4 in the -regress- statement.

                    Comment


                    • #11
                      Wait, I think I see the problem.

                      When you run -xtset id yrm- and then -tsfill, full-, it creates an observation for every combination of id and yrm, with all other variables (including year) set to missing if the corresponding observation did not already exist. But the code within the loop needs the corresponding values of year to be there! So, after the -tsfill, full-, put -replace year = year(dofm(yrm)) if missing(year)-, and that should fill that in. Then it should work.

                      If it doesn't then please do as I suggested in #10 above so I can try to figure it out.

                      Comment


                      • #12
                        Hi, Clyde, I think the codes worked but it took a very long time to execute. I actually never succeed in applying the codes for all my sample. My sample runs from 1980-2013. It takes forever to run the code. So I restrict the sample to a much shorter period, say 2005-2013. and found it worked! I am wondering if there is anyway we can do in stata to speed up the double loops! loops through all years and loop through all mutual funds.

                        Comment

                        Working...
                        X