Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rolling regression every 6 months

    For my project, I have a set of search terms and their daily observations for 18 years from 1-1-2004 to 29-11-2022. I want to regress each word with the Fama French factors 'rmf' variable over this time, to form an index of search terms. As each word will be searched at different quantities throughout the 18 years, I want to do rolling regressions to extract those with the most negative t-statistic with the market every 6 months.

    This is the current code I am running:
    Code:
    . foreach var of varlist ldiffcost_w - ldiffexpense_w {
      2. tsset date, daily
      3. rolling _b, window(6/12*`=_N'): regress `var' rmrf, robust if date >= "1-1-2004" & time <= "29-11-202
    > 2"
      4.     if e(b)[1,1] < 0 local negvars `negvars' `var'
      5. }
    
    Time variable: date, 1/1/2004 to 11/29/2022
            Delta: 1 day
    window() invalid -- single number required
    r(123);
    I am not sure how to correct this

  • #2
    This code is very puzzling. It reads as a series of guesses at what might work in the hope that Stata will understand your intentions.

    1.

    Code:
    tsset date, daily
    can't be consistent with treating date on the next code line as a string variable. A repeated tsset does no harm but is pointless in the case.

    2. Any if condition should appear before options, not as part of them.

    3. The condition referring to two different variables date and time
    Code:
      
     if date >= "1-1-2004" & time <= "29-11-2022"
    perhaps should be interpreted in terms of a string variable time, but dates and strings don't usually mix. For example the dates "30-1-2004", "31-1-2004" and many others such as "31-10-2022" will as strings qualify as greater than (meaning, sorts later than) "29-11-2022" and will as such be excluded by your condition. But your intended meaning seems clear: to limit comparisons to the period between 1 January 2004 and 29 November 2022. But that last condition is pointless, as your tsset result indicates that you only have dates in that interval any way.

    So let's cut out what is at best irrelevant and at worst illegal. The code now looks like

    Code:
    tsset date, daily
    
    foreach var of varlist ldiffcost_w - ldiffexpense_w {
          rolling _b, window(6/12*`=_N'): regress `var' rmrf, robust        
          if e(b)[1,1] < 0 local negvars `negvars' `var'  
    }
    It's the window() call that bites you before any of these other problems. But at best 6/12 * _N is half the number of observations in the dataset, which has no connection at all to 6 monthly intervals for daily data

    But what you mean by 6 monthly intervals? Here are some interpretations.

    1. You have daily data and want the window to be say 182 or 183 days long at most. Make sure you have thought through weekends, holidays and leap years as well.

    2. You have daily data but want the windows each to include all days for say say (January, June), ..., (June, December), ..., (December, May).

    3. There are probably other interpretations, reason enough to stop there and refer the question back.

    We haven't finished yet, as rolling will of necessity carry out many regressions for data like this -- and which coefficient out of typically several do you wish to use?. Note that contrary to your invocation of e(b)[1,1] -- as explained in its help:

    rolling sets no r- or e-class macros.
    Although I advised earlier against including all predictors just because their coefficient is negative, it seems that your intention will imply that different predictors will qualify for each regression, but the code does not seem to be even trying to do that. .
    Last edited by Nick Cox; 28 Dec 2022, 08:20.

    Comment


    • #3
      Thank you, Nick, for 6-month observations, I am referring to your second interpretation- I want to include one window for January-June, then July-December each year.

      For the negative relationship I want to only identify those that have a negative t-statistic, and use this second code to filter those that do. The reason for this is to collect a daily average of all negative t-statistic words in that window to form my index. Just to be clear, are you saying the rolling function does not coincide with the e(b) function I am using?

      Comment


      • #4
        As already pointed out -- do read its help -- the rolling command (not function) does notset e(b). I don't know why that's unclear to you. It is prominently documented.

        I understand only some of what you're trying to do, and it does strike me that your problems will include not just doing it but also explaining it and in particular explaining why you did what you did and not something else. Is this your Master's or PhD thesis? Do you not have a supervisor or committee to advise?

        Assuming -- it's a big assumption as some of your code contradicts it -- that date is a Stata daily date variable then it seems that your first big step is a series of regressions for which rangestat from SSC is immensely better suited than rolling -- as you want to mix daily data and monthly intervals.

        Code:
        gen mdate = mofd(date) 
        
        foreach v of var ldiffcost_w - ldiffexpense_w { 
        
        rangestat (reg) `v' rmrf , interval(mdate  0 5) 
        
        rename reg_nobs nobs_`v' 
        rename b_rmrf coef_`v' 
        
        drop reg_* b_* se_* 
        
        }
        This is a just a sketch and in no sense tested.

        That runs your regressions in 6-month rolling windows. Then and only then you can look at your more than 200 regressions and select those predictors for which the coefficients are negative -- or possibly much better select which coefficients are most negative.

        Comment


        • #5
          Thanks a lot Nick, this seems to have worked.

          If I then want to average each observations on each day and use the negative coefficients, is this the appropriate code:
          Code:
             if e(b)[1,1] < 0 local negvars `negvars' `var' 
          egen UKIS = rowmean(`negvars')

          Comment


          • #6
            I can't see what code precedes that but it has absolutely nothing to do with the code I am suggesting. rangestat does not call up regress and it leaves nothing in e(b). The first fact is hidden in the code but the second fact is something that you could check quickly by experiment.

            If you are determined to collect a list of predictors for which coefficients are negative, then in general

            1. The list will be different for each regression.

            2. So it is futile to imagine that you can hold such a list usefully in a local macro.

            Hence you must hold it in a variable.


            Code:
            gen negvars = " " 
            
            foreach v of var <varlist> { 
                 replace negvars = negvars + "`v' " if b_`v' < 0 
            }
            is the flavour I imagine.

            Comment


            • #7
              Thanks Nick!

              Applying your code gives me this error. Would I need to take out the 'b' as I have already defined the coefficient?

              Code:
              . foreach v of var coef_ldiffcost_w - coef_ldiffrate_w { 
                2.      replace negvars = negvars + "`v' " if b_`v' < 0 
                3. }
              b_coef_ldiffcost_w not found
              r(111);

              Comment


              • #8
                Your typical names imply something more like

                Code:
                 
                 foreach v of var ldiffcost_w - _ldiffrate_w {        replace negvars = negvars + "`v' " if coef_`v' < 0  }

                Comment


                • #9
                  Thanks Nick, that has shown me those with negative coefficients. To average each of these values to gain my index for each day, would I need to
                  Code:
                  egen UKIS = rowmean (negvars)
                  ?

                  I tried this and it still gives me a value that is positive, when in theory they should all be negative.

                  For reference this is the negvars values that are displayed after using your above code:
                  Code:
                  negvars
                  ldiffdonation_w ldiffasset_w ldiffCompetitiveadvantage_w ldiffGold_w ldiffHoliday_w ldiffmoney_w ldiffrecession_w ldifftax_w ldiffworkerscompensation_w ldiffbudget_w ldiffcompany_w ldiffentrepreneur_w ldiffcredit_w ldiffdebt_w ldifflawyer_w ldiffdefault_w ldiffrate_w
                  ldiffdonation_w ldiffasset_w ldiffCompetitiveadvantage_w ldiffGold_w ldiffHoliday_w ldiffmoney_w ldiffrecession_w ldifftax_w ldiffworkerscompensation_w ldiffbudget_w ldiffcompany_w ldiffentrepreneur_w ldiffcredit_w ldiffdebt_w ldifflawyer_w ldiffdefault_w ldiffrate_w
                  ldiffdonation_w ldiffasset_w ldiffCompetitiveadvantage_w ldiffGold_w ldiffHoliday_w ldiffmoney_w ldiffrecession_w ldifftax_w ldiffworkerscompensation_w ldiffbudget_w ldiffcompany_w ldiffentrepreneur_w ldiffcredit_w ldiffdebt_w ldifflawyer_w ldiffdefault_w ldiffrate_w
                  ldiffdonation_w ldiffasset_w ldiffCompetitiveadvantage_w ldiffGold_w ldiffHoliday_w ldiffmoney_w ldiffrecession_w ldifftax_w ldiffworkerscompensation_w ldiffbudget_w ldiffcompany_w ldiffentrepreneur_w ldiffcredit_w ldiffdebt_w ldifflawyer_w ldiffdefault_w ldiffrate_w
                  ldiffdonation_w ldiffasset_w ldiffCompetitiveadvantage_w ldiffGold_w ldiffHoliday_w ldiffmoney_w ldiffrecession_w ldifftax_w ldiffworkerscompensation_w ldiffbudget_w ldiffcompany_w ldiffentrepreneur_w ldiffcredit_w ldiffdebt_w ldifflawyer_w ldiffdefault_w ldiffrate_w
                  This is showing all variables for each day that have negative coefficients.

                  Comment


                  • #10
                    Also for each word, the coefficient is the same throughout the 6 months, even though 'rmrf' changes each day in the 6 months and so does the x variable. Not sure if the code only uses the 'rmrf' value for the first day of the 6 month window.

                    Comment


                    • #11
                      Sorry, but #9 once again is very confused.

                      negvars as calculated by say #8 is a string variable with a set of variable names as values so an attempt at calculating a row mean should fail. I don't know what you did there.

                      Besides if I understand correctly, it is the coefficients associated with those variables, not the variables themselves, that should be negative.

                      This isn't working out, unfortunately, Compounding this uncertainty with your inability to run one thread at a time, and it;s far too hard to follow what you are doing and what you should be doing next. It seems that you have chosen a project that you have no chance of completing unless someone else does all the heavy lifting. That is not a good situation. It is not a fair expectation of Statalist.

                      Comment

                      Working...
                      X