Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rolling: extra condition for minimal number of observations?

    Dear all,

    I need some help in the following.

    I need to run rolling regressions on monthly data, using a window of 36 periods, but I have to have at least 12 observations for each estimation - and those do not need consecutive.
    I was thinking of having some sort of "if" condition, which would skip a step when not satisfied. For example, if in period 1 I have at least 12 observations, regression will run and I will get the coefficients, at period 2 condition is not satisfied, so I would have no coefficients (is it possible to have missing values?), and in period 3, if i have enough observations, the regression will run again. Any idea how to implement it?

    Thank you in advance,
    Natalia

    PS Something similar has been asked in the link below, however the answer is not exactly what I want to have...
    There are monthly mutual fund returns directly obtained from CRSP mutual fund dataset, called the raw net return. But in the literature, researchers usually

  • #2
    What you could do much more easily is just use rolling to do regressions, saving the number of values used as well as other results, and then set what you don't trust to missing afterwards. I'm suspicious of rules like "at least 12" and would advise just looking at the results any way.

    Code:
    * set up sandbox to play
    set seed 2803
    webuse lutkepohl2, clear
    replace dln_inv = . if runiform() < .2
    tsset qtr
    save sandbox
    
    rolling _b N=e(N), window(30): regress dln_inv dln_inc dln_consump


    Comment


    • #3
      Thanks for the answer. I guess adding N=e(N) into command is what will help me to do what I want.

      Comment


      • #4
        I have an additional follow-up question: my panel consists of over 3m observations - I find -rolling- to be extremely slow (it took 3 days to go through 3 000 distinct panels out 41 000). I am trying to optimise it, but I am struggling a create the local macro appropriately. My proposed code is (modified slightly from proposed version in one of the forums: http://hsphsun3.harvard.edu/cgi-bin/...icle-1241.html ):
        tsset newid month
        gen end=month //for merging

        forvalues i=1/33750 {
        keep if newid==`i'
        quietly: rolling _b N=e(N), window(36) saving(betas, replace) nodots: regress ex_ret_t ex_m_ret_t lag_ex_m_ret_t
        merge newid end using "betas", sort update replace nokeep
        drop _merge
        }

        What I am trying to do is to create the loop to do rolling regressions for individual panels one by one. Above, I realise that in order to use -keep- inside the loop I need to make `i' local. The command -levelsof- does not work, since it hits the memory limit (I receive an error r(1000) ).

        Do you have any idea how to do this?

        Comment


        • #5
          Well, what you've coded will run super-quick, because it's wrong. On the first iteration of the loop, you keep only observations where newid == 1. On the second run through the loop you then attempt to keep only those observations where newid == 2, but since the only observations still around have newid == 1, you will end up with an empty data set. So your regress command will fail and that will be the end of that. To make this work you need a -preserve- at the top of the loop and a -restore- at the end. But including those will make this run very, very slowly as Stata constantly thrashes between memory and disk. You can avoid that eliminating the -preserve- -keep if- and -restore- commands and adding an -if newid == `i'- condition to the -regress- command. This, too, will be slow, but probably better than thrashing.

          Also, your -merge- command is very old syntax. It is safer to specify -merge 1:1 newid end... Finally, instead of -drop-ing _merge at the end of the loop, you could just add the -nogenerate- option to the -merge- command itself.

          When all is said and done, I think no matter what you do this is going to be a long slog. No matter how you slice it you are asking Stata to identify a few million data subsets, perform regressions on them, and save the results. That's a lot of computing.

          Comment


          • #6
            Thank you for the answer. I was originally trying to do the code as suggested in the link above by adding a -local- outside of loop, consisting of distinct values of newid variable, and that is where I got stuck - I believe such a local is pretty straight forward to create using -levelsof- , but which gives me an error since it hits the memory limit (I use Stata 12). So I did not find an alternative way to store a list of distinct "newid" values in a list, to use inside the local macro. Having local should also solve the above problem with -keep- ...

            I have tried using the -if- within the loop, and it is indeed quite inefficient.

            To my understanding, the whole problem with -rolling- being slow (very slow, actually) is exactly due to the fact that it takes too long to search through subsets when I have a panel. Running rolling regressions separately for each id is much faster than doing so in a bunch. Hence the problem above.

            I was thinking to reshape my data into wide to run the loops, but again, since I have too many ids and thus too many panels, I get an error when I try re-shaping.

            Comment


            • #7
              Just to clarify, this is the piece of code someone else has suggested in a similar problem to mine:

              tsset id date gen end=date // for later merging
              tempfile stats levelsof id, local(ids)
              foreach id of local ids {
              keep if id==`id' quietly: rolling, window(`window') saving(`stats', replace) ///
              nodots: regress y x
              merge id end using "`stats'", sort update replace nokeep
              drop _merge }

              And somewhat similar code, which I was also trying to use for my problem:
              Code:
                tsset newid month
                gen end=month //for merging
                
                local newids newid
                tempfile stats
                foreach newid of local newids{
                           keep if newid==`newid'
                        quietly: rolling _b N=e(N), window(36) saving(`stats', replace) nodots: regress ex_ret_t ex_m_ret_t lag_ex_m_ret_t
                        merge 1:1 newid end using "`stats'", sort update replace nokeep nogenerate
                }


              Notably, it already does not delete almost all the observations,as my wrong code above, but I am concerned with how local stores the list of values.

              Last edited by Natalia Kovaleva; 20 Apr 2015, 17:04. Reason: some clarifications

              Comment


              • #8
                This won't work. -local newids newid- just puts the character string "newid" into local macro newids. When you then get to the -foreach newid...- loop, there will be only a single iteration, with `newid' set to "newid". When you then go inside the loop to the -keep if- statement you will get a syntax error since -newid- must be a numeric variable and you are comparing it to a literal character string. (Aside: your -tsset- statement is also incorrect: you mean -xtset-.) The approach with the -forvalues- loop in #4 does not have those problems.

                I would go back to that approach. Just make sure you either include -preserve- and -restore- commands so you don't eliminate all your data in the first two rounds, or, use an -if- qualifier on your -regress- command instead of the -keep if- approach. I have a hard time believing that the -if- qualifier approach will be slower than constantly thrashing a large data set to disk, but I suppose you could try both on the first few hundred panels to see.

                Comment


                • #9
                  Just small observation for those who are struggling with the similar issue: it helps to keep the -merge- command outside of the loop. I am saving the coefficients into "betas" files, in order to merge them with data after.

                  Comment


                  • #10
                    Dear all,

                    I have the same issue using "rolling" with a condition of minimum number of observations.However, I cannot see exactly how the problem was solved in the code above.
                    Could you please say what is "newid"? I guess it helps to run rolling regression only when there are minimum 12 observations, is that right?

                    Thanks a lot.

                    Comment

                    Working...
                    X