Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • choosing x in " rangestat (reg) y x " as the x specified in a variable

    hello

    I wonder if rangestat reg would allow for choosing in " rangestat (reg) y x " as the x specified in a variable. I hope the following example is self-explanatory.

    Code:
    clear *
    set obs 100
    gen day=_n
    gen y=runiform(1,10)
    gen x1=runiform(1,10)
    gen x2=runiform(1,10)
    gen xvar="x1" if mod(day,2)==0
    replace xvar="x2" if mod(day,2)>0
    
    rangestat (reg) y x1 , interval(day -30 -1)
    rename (reg_* b_* se_*) (x1reg_* x1b_* x1se_*)
    
    rangestat (reg) y x2 , interval(day -30 -1)
    rename (reg_* b_* se_*) (x2reg_* x2b_* x2se_*)
    
    gen reg_r2=.
    replace reg_r2=x1reg_r2 if xvar=="x1"
    replace reg_r2=x2reg_r2 if xvar=="x2"  //can reg_r2 like this be obtained with one single rangestat? or any better solution?
    Thanks in advance for your advice.

  • #2
    I understand the code but I am not clear that I understand the desire here.

    At its simplest, there is no need to manipulate variable names here. You just calculate what you want to use in advance of calling rangestat (SSC, as you are asked to explain).

    Alternatively, two different regressions are -- two different regressions.

    Comment


    • #3
      In my actual case, there are many x's and if I do separate rangestat for each x, then i need to do many of them. so I wonder if there is a way to do it with one rangestat. i guess cases similar to my example may arise when, for example, in cases of international trade, country Y's largest trading partner (X) vary over years, and one is interested to find out the impact of the largest trading partner on Y.

      Comment


      • #4
        Well, there is a discrepancy between what you say you want in words and what your code does. To do what you said you want in words you can run:

        Code:
        clear *
        set obs 100
        gen day=_n
        gen y=runiform(1,10)
        gen x1=runiform(1,10)
        gen x2=runiform(1,10)
        gen xvar="x1" if mod(day,2)==0
        replace xvar="x2" if mod(day,2)>0
        
        levelsof xvar, local(xx)
        gen real_x = .
        foreach x of local xx {
            replace real_x = `x' if xvar == "`x'"
        }
        
        rangestat (reg) y real_x, interval(day -30 -1)
        This creates a new variable, real_x, which is equal to the value of the variable designated by xvar and then runs the rolling regressions with a 30 day window using that.

        The code you show in #1 is different: it regresses the entire data set against x1, and then the entire data set against x2, and then selects the regression coefficient corresponding to the regression using the variable designated by xvar. That is very different.

        Only you know which one you really want.

        Comment


        • #5
          Clyde Schechter . Thanks for looking into. What I really want is what the code in #1 does.

          Comment


          • #6
            Perhaps like Clyde, I remain fuzzy on what you want and what you are failing to achieve with your real problem.

            You can fire up different regressions with rangestat (SSC). You just need to spell out different variable names for the results.

            Conversely, if the code in #1 does what you want, I really don't know what the question is.

            Comment


            • #7
              Nick Cox , my question is whether there is a more efficient way to achieve what the code in #1 does. My understanding is that in rangestat, for the iteration for each observation, the data in memory is cleared and replaced with the set of observations in range for the current observation. I wonder if at this point, it is possible to instruct rangestat to look for the x of interest (the varname of which being specified in another variable). If this is possible, it seems to me more efficient than doing multiple rangestat and then combine the results to get what is wanted (as the code in #1 does).

              Comment


              • #8
                The (reg) statistic can only be used once per rangestat call because of variable renaming issues. That means that it is limited to one regression per call. It seems that you want to be able to perform different regressions depending on a characteristic of the current observation. You could program your own Mata routine to do this with rangestat. Otherwise, you can do this with rangerun (also from SSC):

                Code:
                clear all
                set seed 1234
                set obs 100
                gen day=_n
                gen y=runiform(1,10)
                gen x1=runiform(1,10)
                gen x2=runiform(1,10)
                gen nxvar=1 if mod(day,2)==0
                replace nxvar=2 if mod(day,2)>0
                
                program switch_reg
                    local n = nxvar[_N]
                    local last = _N - 1
                    reg y x`n' in 1/`last'
                    gen reg_nobs = e(N)
                    gen reg_r2 = e(r2)
                end
                rangerun switch_reg, interval(day -30 0)
                
                * spot check with obs 50 and 51
                list in 50/51
                reg y x1 if inrange(day, day[50]-30, day[50]-1)
                dis e(r2)
                reg y x2 if inrange(day, day[51]-30, day[51]-1)
                dis e(r2)
                and the spot check results:
                Code:
                . * spot check with obs 50 and 51
                . list in 50/51
                
                     +--------------------------------------------------------------------+
                     | day          y         x1         x2   nxvar   reg_nobs     reg_r2 |
                     |--------------------------------------------------------------------|
                 50. |  50    8.99813   4.776833   5.852485       1         30   .0010042 |
                 51. |  51   9.214911    2.01923   9.878716       2         30   .0001987 |
                     +--------------------------------------------------------------------+
                
                . reg y x1 if inrange(day, day[50]-30, day[50]-1)
                
                      Source |       SS           df       MS      Number of obs   =        30
                -------------+----------------------------------   F(1, 28)        =      0.03
                       Model |  .206867688         1  .206867688   Prob > F        =    0.8680
                    Residual |  205.794637        28  7.34980847   R-squared       =    0.0010
                -------------+----------------------------------   Adj R-squared   =   -0.0347
                       Total |  206.001505        29  7.10350017   Root MSE        =    2.7111
                
                ------------------------------------------------------------------------------
                           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                          x1 |   .0345631   .2060177     0.17   0.868     -.387445    .4565712
                       _cons |   5.462059   1.224887     4.46   0.000     2.952991    7.971127
                ------------------------------------------------------------------------------
                
                . dis e(r2)
                .0010042
                
                . reg y x2 if inrange(day, day[51]-30, day[51]-1)
                
                      Source |       SS           df       MS      Number of obs   =        30
                -------------+----------------------------------   F(1, 28)        =      0.01
                       Model |    .0427904         1    .0427904   Prob > F        =    0.9411
                    Residual |  215.327863        28  7.69028082   R-squared       =    0.0002
                -------------+----------------------------------   Adj R-squared   =   -0.0355
                       Total |  215.370653        29  7.42657425   Root MSE        =    2.7731
                
                ------------------------------------------------------------------------------
                           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                          x2 |   .0158605   .2126251     0.07   0.941    -.4196823    .4514032
                       _cons |   5.633254   1.245702     4.52   0.000      3.08155    8.184958
                ------------------------------------------------------------------------------
                
                . dis e(r2)
                .00019868

                Comment


                • #9
                  I wonder if at this point, it is possible to instruct rangestat to look for the x of interest (the varname of which being specified in another variable).
                  This does not make sense in light of your example. For a given range of values (day from -30 to -1), there is no such thing as "the" x of interest--different x's are designated for different observations within that range.

                  Be that as it may, I don't see any way to get a single run of -rangestat- to use different variables for the regression within a single range of data.

                  You can avoid having to code each regression separately by putting them in a loop.

                  Code:
                  levelsof xvar, local(xx)
                  gen nobs = .
                  gen r2 = .
                  gen adj_r2 = .
                  gen b = .
                  gen se = .
                  foreach x of local xx {
                      rangestat (reg) y `x', interval(day -30 -1)
                      replace r2 = reg_r2 if xvar == "`x'"
                      replace adj_r2 = reg_adj_r2 if xvar == "`x'"
                      replace b = b_`x' if xvar == "`x'"
                      replace se = se_`x' if xvar == "`x'"
                  }
                  That makes the coding more efficient, but doesn't materially alter the execution.

                  Added: Crossed with #8.
                  Last edited by Clyde Schechter; 04 Dec 2017, 09:49.

                  Comment


                  • #10
                    Robert Picard , thank you very much. This is exactly what I am looking for.

                    You mention this could also be programmed with Mata routine (which unfortunately I am not familiar with). May I ask if it would be faster than using the rangerun approach you shown?

                    Thanks again.

                    Comment


                    • #11
                      Clyde Schechter , thank you for looking into!

                      Comment


                      • #12
                        It would definitively be faster to do it in a custom Mata function because the Mata regression code does not have to perform a lot of the overhead that comes with the regress command. But you have to balance the gain in execution speed with the time it would take you to learn Mata so that you can code your special case.

                        Comment


                        • #13
                          Thanks for the reply, Robert!

                          Comment


                          • #14
                            It just dawned on me that you can speed up the whole task by looping over trading partners, as suggested by Clyde in #9, and using an invalid interval bound for all observations with different partners. You still have to call rangestat as many times as there are partners but it will only perform regressions for the observations with the current partner.

                            The following example has 100,000 observations (arranged as a panel) and 50 partners. I first do a dry run to see how long it takes to do a single call (100K regressions). Then I use a loop over each partner using a valid upper bound only for the current partner. Finally, I repeat by adapting Clyde's code in #9:

                            Code:
                            * demontration dataset
                            clear all
                            set seed 1234
                            set obs 100
                            gen long id = _n
                            expand 1000
                            bysort id: gen day = mdy(1,1,1980) + _n
                            format %td day
                            gen y=runiform()
                            gen partner = runiformint(1,50)
                            forvalues i=1/50 {
                                gen x`i' = runiform()
                            }
                            
                            * a single call that performs all regressions
                            timer on 1
                            rangestat (reg) y x1 , interval(day -30 -1) by(id)
                            drop reg_* b_* se_*
                            timer off 1
                            
                            * multiple calls, one for each partner
                            * use an invalid interval bound to limit regression for target partner
                            timer on 2
                            gen rs_nobs = .
                            gen rs_r2   = .
                            gen rs_b    = .
                            
                            qui forvalues i=1/50 {
                                gen high = cond(partner == `i', day-1,-999)
                                rangestat (reg) y x`i' , interval(day -30 high) by(id)
                                replace rs_nobs = reg_nobs if partner == `i'
                                replace rs_r2   = reg_r2 if partner == `i'
                                replace rs_b    = b_x`i' if partner == `i'
                                drop reg_* b_* se_* high
                            }
                            timer off 2
                            
                            * compare with repeating calls by adapting Clyde's code
                            levelsof partner, local(xx)
                            timer on 3
                            gen nobs = .
                            gen r2 = .
                            gen b = .
                            qui foreach x of local xx {
                                rangestat (reg) y x`x', interval(day -30 -1) by(id)
                                replace nobs = reg_nobs if partner == `x'
                                replace r2   = reg_r2 if partner == `x'
                                replace b    = b_x`x' if partner == `x'
                                drop reg_* b_* se_*
                            }
                            timer off 3
                            
                            timer list
                            
                            * show that results match
                            assert nobs  == rs_nobs
                            assert rs_r2 == r2
                            assert rs_b  == b
                            The timing results on my computer are:
                            Code:
                            . timer list
                               1:      2.04 /        1 =       2.0450
                               2:     15.20 /        1 =      15.1960
                               3:     98.89 /        1 =      98.8890
                            I should add a disclaimer that I only do data management and I offer solutions to the questions asked but I make no claims or representations regarding the statistical issues involved.

                            Comment


                            • #15
                              Robert Picard , thank you very much again. And your disclaimer is duly noted!

                              May I ask is there any particular reason you add the panel structure to the data, apart from making the data bigger?

                              Comment

                              Working...
                              X