Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Some speed-ups are certainly possible.

    * Using egen to get maximum, minimum, totals is very inefficient compared with using summarize, meanonly.

    * There's no need to scale analytic weights. Stata does that for you.

    There are some puzzles.

    * This won't work well unless your firms are already numbered 1 up, in which case you can find out directly how many there are by looking at the values of firm

    * You probably don't need to replace the weights nearly so often.

    Code is completely untested. Check very very carefully!

    Code:
    clear all
    cd "C:\Users\md_kh\Dropbox\ECF\Codes"
    use "stock_inflation_data.dta", clear
    
    * create "time" variable which is consecutive date of observations
    bysort permno (date): gen time = _n
    order permno firm date time dur shrout price exret inflation
    
    * variable related to the regressions
    gen a = .
    gen b = .
    gen w = .
    gen n = .
    gen w1 = .
    
    * finding numbers of firms
    by permno, sort: gen nvals = _n == 1 
    replace nvals = sum(nvals) /* the last value is sum of the distinct permnos */
    local firm_number = nvals[_N]
    drop nvals
    
    quietly forval j = 1/`firm_number' {
        * finding max and min of time for each firm j (min for all of them is 1)
        su time if firm == `j', meanonly 
    
        forval T = `r(min)'/`r(max)' {
            count if exret < . (time < `T') & (time >= `T'-12) & firm ==`j'
            if r(N) < 4 continue
                
            replace w1 = exp(-(`T' - time)* log(2)/60) if `T' > time & firm == `j'   
    
            capture {
                regress exret inflation [aweight=w] if firm == `j'
                replace a = _b[_cons] if firm == `j' & time == `T'
                replace b = _b[inflation] if firm == `j' & time == `T'
                replace n = e(N) if firm == `j' & time == `T'
            }
        }
    }
    
    save "beta.dta"
    Code:
    
    


    Comment


    • #17
      Dear Nick,

      Thanks a lot for all the help, time, and support. It means a lot to me since I am a beginner in stata. As you mentioned, the number of firms starts from 1. Also, I don't need "a" from the regression, so I can drop it as well to speed up the code.

      Thanks again,
      Mohammad

      Comment


      • #18
        Code:
         
         count if exret < . & (time < `T') & (time >= `T'-12) & firm ==`j'

        Comment


        • #19
          I think the first order of business is to get certainty on the question of how to calculate the weights. I'm not sure that Nick's does it right because it exclude the current observation. This make the regression run on past observations up to but not including the current observation. If I'm right, the weight of the current observation is always exp(0) which is 1 (the denominator is a constant and can be excluded) and goes down the further away in the past.

          Let me also say that this is a pretty complex thing to want to do for a Stata newbie so I'll remind everyone that a rolling regression generates results for each observation in the data. This means that you can manually compute results for any observation using standard Stata commands. What's missing in this case is a realistic dataset that has the same structure as the one in question. So the following example constructs data that appears to have the same characteristics as the one that Mohammad uses, with 5 permno and 60 periods.

          In the example below, I set up the problem for two test observations. See help subscripting to get an explanation of how to retrieve the value of a particular variable for a given observation. With the data setup, observation 40 contains the 40th time period of the first permno. The 100th observation in the data refers to the 40th time period for the second permno.

          Once satisfied that the two test cases are correctly constructed and calculate the desired results, you can set up a loop to go over each observation. I assume that time is coded as starting from 1 and skip calculations until we reach the 4th period. The regress command is captured just in case something goes wrong (no enough observations due to missing values or collinearity issues).

          Finally, on this small test run, you can check that the results generated by the loop match those calculated manually.

          Code:
          * create fake data with the same characteristics
          clear all
          set seed 312312
          set obs 5
          gen long permno = runiformint(1111111,9999999)
          expand 60
          bysort permno: gen time = _n
          gen exret = runiform()
          gen inflation = runiform()
          
          * generate results for a specific observation of the first permno
          local obs 40
          list in `obs'
          gen double ww = exp(-(time[`obs'] - time) * log(2)/60) if permno == permno[`obs']
          regress exret inflation [aweight=ww] if permno == permno[`obs'] & time <= time[`obs']
          
          * generate results for a specific observation of the second permno
          drop ww
          local obs 100
          list in `obs'
          gen double ww = exp(-(time[`obs'] - time) * log(2)/60) if permno == permno[`obs']
          regress exret inflation [aweight=ww] if permno == permno[`obs'] & time <= time[`obs']
          
          * do all weighted regressions in a loop, skip if there are less than 4 obs
          gen a = .
          gen b = .
          gen n = .
          gen double w = .
          
          local nobs = _N
          
          quietly forvalues i = 1/`nobs' {
          
              local p = permno[`i']
              local t = time[`i']
              
              if `t' >= 4 {
                  replace w = exp(-(`t' - time) * log(2)/60) if permno == `p' 
                  cap regress exret inflation [aweight=w] if permno == `p' & time <= `t'
                  if _rc == 0 {
                      replace a = _b[_cons] in `i'
                      replace b = _b[inflation] in `i'
                      replace n = e(N) in `i'
                  }
              }
          }
          drop w ww 
          
          * verify results for a specific observation
          list in 40
          list in 100

          Comment


          • #20
            Dear Nick,

            Thanks a lot. Th last code which you posted is much faster than what I wrote. :-) For example, for 20 firms, the first one takes 350 seconds but the second one takes only 30 seconds. :-) Excellent improvements.

            Thanks. :-)

            Comment


            • #21
              Dear Robert,

              Thanks a lot for your help. (Robert Picard). I understood your logic and it seems that your code is the fastest one (20 seconds for 20 firms). However, there is a problem: in your code, only the first 4 observations are ignored, but I want to ensure that there are at least 4 non-missing observations of exret in the last 12 periods. Otherwise, I will leave b as missing value. To this end, I changed your code as follows, but now it is not as efficient as before. (take 50 seconds for 20 firms). Is my change not efficient enough or this speed reduction is inevitable?


              quietly forvalues i = 1/`nobs' {

              local p = permno[`i']
              local t = time[`i']

              count if (exret < .) & (time < `t') & (time >= `t'-12) & (permno ==`p')
              if r(N) < 4 continue

              replace w = exp(-(`t' - time) * log(2)/60) if permno == `p'

              capture regress exret inflation [aweight=w] if permno == `p' & time < `t'
              if _rc == 0 {
              replace b = _b[inflation] in `i'

              }

              }

              Comment


              • #22
                I can't say I understand the logic of wanting at least 4 non-missing observations of exret since this is a recursive window. For each regression, the sample includes all the observations of the same company from time 1 to the current time period. Once you reach 4 non-missing observations, all later regressions will have at least 4 observations.

                If you really want to reject a regression, say in time period 72, because there's less than 4 non-missing observations from period 61 to 72 periods, even if there are no missing values from period 1 to 60, you can use rangestat to generate such sample size counts. In the example below, I added random missing values for exret and missing time periods as well.

                In terms of efficiency, for 4000 companies with 400 time periods, I think the fastest solution is to loop over each time period and use rangestat with a streamlined Weighted Least Squares Mata function to perform the regressions. At each time period, you compute the appropriate weights. You then run rangestat to compute the WLS, but only on the observations of that time period, based on an interval that includes all observations from time 1 to the period in question.

                As with my previous example, you can check to your heart's content that the results are computed correctly by generating results using standard Stata commands. At the end of the example, I spot check for observations 50, 500, and 5000. The following example includes 100 companies with 400 time periods. With the missing values I introduced, this runs in 42 seconds on my computer. You can expect that rangestat's execution times will be linear, so with 40 times more companies, the whole thing could run in 30 minutes.

                Code:
                * create fake data with the same characteristics
                clear all
                set seed 312312
                set obs 100
                gen long permno = runiformint(1111111,9999999)
                expand 400
                bysort permno: gen time = _n
                gen exret = runiform()
                gen inflation = runiform()
                
                * insert some missing values and missing time periods
                replace exret = . if runiform() < .1
                drop if runiform() < .1
                
                * precompute number of non-missing observations in the past 12 periods
                rangestat (count) exret, interval(time -11 0) by(permno)
                
                * define a Mata function for Weighted Least Squares, original OLS Mata code derived from
                * http://blog.stata.com/2016/01/12/programming-an-estimation-command-in-stata-an-ols-command-using-mata/
                * and adapted using the information on
                * https://www.mathworks.com/help/curvefit/least-squares-fitting.html?requestedDomain=www.mathworks.com
                mata:
                mata clear
                real rowvector wls(real matrix Xall)
                {
                    real colvector y, b, w, wy
                    real matrix X, XpX, XpXi, wX
                    real scalar n, k
                    
                    y = Xall[.,1]                // dependent var is first
                    X = Xall[.,2::cols(Xall)-1]  // independent variables
                    w = Xall[.,cols(Xall)]         // weights
                    n = rows(X)                  // the number of observations
                    X = X,J(n,1,1)               // add a constant
                    k = cols(X)                      // number of independent variables
                    
                    if (n > k) {    // need more than k obs to estimate model and standard errors
                        wX = w :* X
                        wy = w :* y
                        XpX  = quadcross(X, wX)
                        XpXi = invsym(XpX)
                        if (diag0cnt(XpXi) == 0) {
                            b = XpXi * quadcross(X, wy)            
                        }
                    }
                    
                    return(rows(X), b')
                }
                end
                
                * initialize results
                gen double rs_a = .
                gen double rs_b = .
                gen rs_n = .
                
                * loop over each time period, do all companies at the same target time period
                qui forvalues i = 4/400 {
                
                    // weights are relative to the target time period
                    gen double w = exp(-(`i' - time) * log(2)/60)
                    
                    // identify observation in current period with enough sample
                    gen byte doit = time == `i' & exret_count >= 4
                
                    // define a valid interval only for obs at the target time period
                    gen high = cond(doit, `i', 0) 
                    rangestat (wls) exret inflation w, interval(time . high) by(permno) casewise
                    
                    // copy results for the target time period
                    replace rs_a = wls3 if doit
                    replace rs_b = wls2 if doit
                    replace rs_n = wls1 if doit
                    
                    drop w doit high wls*
                }
                
                * run regressions for some observations
                foreach obs in 50 500 5000 {
                    gen double ww = exp(-(time[`obs'] - time) * log(2)/60) if permno == permno[`obs']
                    regress exret inflation [aweight=ww] if permno == permno[`obs'] & time <= time[`obs']
                    drop ww
                    list in `obs'
                }

                Comment


                • #23
                  Or better yet, compute the weights on the fly in the Mata function. The following runs the full-sized problem in 1 minute, that's 1.4 million regressions using a recursive window.

                  Code:
                  * create fake data with the same characteristics
                  clear all
                  set seed 312312
                  set obs 4000
                  gen long permno = runiformint(1111111,9999999)
                  expand 400
                  bysort permno: gen time = _n
                  gen exret = runiform()
                  gen inflation = runiform()
                  
                  * insert some missing values and missing time periods
                  replace exret = . if runiform() < .1
                  drop if runiform() < .1
                  
                  * precompute number of non-missing observations in the past 12 periods
                  rangestat (count) exret, interval(time -11 0) by(permno)
                  
                  * define a Mata function for Weighted Least Squares, original OLS Mata code derived from
                  * http://blog.stata.com/2016/01/12/programming-an-estimation-command-in-stata-an-ols-command-using-mata/
                  * and adapted using the information on
                  * https://www.mathworks.com/help/curvefit/least-squares-fitting.html?requestedDomain=www.mathworks.com
                  mata:
                  mata clear
                  real rowvector mywls(real matrix Xall)
                  {
                      real colvector y, b, w, wy, t
                      real matrix X, XpX, XpXi, wX
                      real scalar n, k, tau
                      
                      y = Xall[.,1]                // dependent var is first
                      X = Xall[.,2::cols(Xall)-1]  // independent variables
                      t = Xall[.,cols(Xall)]         // time variable
                      n = rows(X)                  // the number of observations
                      X = X,J(n,1,1)               // add a constant
                      k = cols(X)                      // number of independent variables
                      
                      if (n > k) {    // need more than k obs to estimate model and standard errors
                          tau = t[rows(t)]
                          w = exp(-(tau :- t) * log(2)/60)
                          wX = w :* X
                          wy = w :* y
                          XpX  = quadcross(X, wX)
                          XpXi = invsym(XpX)
                          if (diag0cnt(XpXi) == 0) {
                              b = XpXi * quadcross(X, wy)            
                          }
                      }
                      
                      return(rows(X), b')
                  }
                  end
                  
                  gen high = cond(exret_count >= 4, time, 0) 
                  rangestat (mywls) exret inflation time, interval(time . high) by(permno) casewise
                  
                  rename mywls1 reg_n
                  rename mywls2 reg_b
                  rename mywls3 reg_a
                  
                  * run regressions for some observations
                  foreach obs in 50 500 5000 {
                      gen double ww = exp(-(time[`obs'] - time) * log(2)/60) if permno == permno[`obs']
                      regress exret inflation [aweight=ww] if permno == permno[`obs'] & time <= time[`obs']
                      drop ww
                      list in `obs'
                  }

                  Comment


                  • #24
                    Woooooow! It computes all the regressions at once! Thanks, Robert. :-)

                    The last point: In my calculations, as Nick correctly did, I don't want to consider the current observation. For example, if the current observation is obs=50, I only want to consider obs 1 to 49 etc. Your code considers the current obs as well. I changed the following line to modify it but it does not work for missing observations properly. Could you plz help me?


                    * precompute number of non-missing observations in the past 12 periods
                    rangestat (count) exret, interval(time -12 0) excludeself by(permno)

                    gen high = cond(exret_count >= 4, time[_n-1], 0)


                    Excluding the current observation, I want to run a wls only if there are at least 4 observations during the 12 past periods.
                    Last edited by Mohammad Khodadadi; 18 May 2017, 07:10.

                    Comment


                    • #25
                      Dear Robert,

                      Regarding your notes at #22:

                      1) About non-missing values, which is not clear to you: I agree that each regression sample consists data from period 1 but the idea is that stocks have at least 4 out of the last 12 months of returns available. This requirement is for ensuring that coefficients are also informative about recent information of stock returns.

                      2) About excluding the current observation, the idea is to use only past information in the estimation.
                      Last edited by Mohammad Khodadadi; 18 May 2017, 09:01.

                      Comment


                      • #26
                        To avoid further confusion about what you are trying to do, why don't you post, using the following data example, code that calculates what you want for the last 3 observations of the first permno (observations 21, 22, and 23) using only standard Stata commands. Note that there are missing time periods as well as a missing value for exret.

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input long(obs permno) float(time exret inflation)
                         1 1470874  1  .9122462 .51331943
                         2 1470874  2  .5948181  .3065938
                         3 1470874  3 .25645664  .4921637
                         4 1470874  4   .930954  .3914094
                         5 1470874  5  .9577237  .8338873
                         6 1470874  6  .5671181  .8355323
                         7 1470874  7  .2708475 .26582384
                         8 1470874  8 .34942105   .672545
                         9 1470874  9  .4345023  .5868091
                        10 1470874 10  .8107689 .23440893
                        11 1470874 11  .5516335 .29357833
                        12 1470874 12  .6561857  .9301422
                        13 1470874 13  .7671857 .04822804
                        14 1470874 14  .4452506 .09964211
                        15 1470874 15 .51762706  .8284212
                        16 1470874 16  .2166194 .13313739
                        17 1470874 17 .19157566  .3796502
                        18 1470874 18  .9459305  .7317557
                        19 1470874 19  .7275512  .0852162
                        20 1470874 20   .927551  .6083384
                        21 1470874 21  .4164995  .4719198
                        22 1470874 29         .  .4488523
                        23 1470874 30  .4370895  .1835875
                        24 6879153  1  .6476775  .3955551
                        25 6879153  2  .4472069 .02203239
                        end

                        Comment


                        • #27
                          Dear Robert,

                          Bellow is the code according to what Nick helped me to write. In order to show you what I mean, instead of using number 4, I introduced a local variable "threshold" that represents the minimum required non-missing observations of exret .

                          According to my logic, If I set threshold equal to 8 (as I did below), then the regression betas related to observations 27, 28, 29, and 30 of firm 1 must be missing (.). I think running the following code would be illustrative.

                          Code:

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input long(obs permno) float(time exret inflation)
                           1 1470874  1  .9122462 .51331943
                           2 1470874  2  .5948181  .3065938
                           3 1470874  3 .25645664  .4921637
                           4 1470874  4   .930954  .3914094
                           5 1470874  5  .9577237  .8338873
                           6 1470874  6  .5671181  .8355323
                           7 1470874  7  .2708475 .26582384
                           8 1470874  8 .34942105   .672545
                           9 1470874  9  .4345023  .5868091
                          10 1470874 10  .8107689 .23440893
                          11 1470874 11  .5516335 .29357833
                          12 1470874 12  .6561857  .9301422
                          13 1470874 13  .7671857 .04822804
                          14 1470874 14  .4452506 .09964211
                          15 1470874 15 .51762706  .8284212
                          16 1470874 16  .2166194 .13313739
                          17 1470874 17 .19157566  .3796502
                          18 1470874 18  .9459305  .7317557
                          19 1470874 19  .7275512  .0852162
                          20 1470874 20   .927551  .6083384
                          21 1470874 21  .4164995  .4719198
                          22 1470874 22         .  .3700192
                          23 1470874 23         .  .2700196
                          24 1470874 24         .  .1845193
                          25 1470874 25         .  .4300197
                          26 1470874 26         .  .8103007
                          27 1470874 27         .  .0300368
                          28 1470874 28         .  .1964691
                          29 1470874 29         .  .4488523
                          30 1470874 30  .4370895  .1835875
                          31 6879153  1  .6476775  .3955551
                          32 6879153  2  .4472069 .02203239
                          end
                          
                          * firm number
                          egen firm = group(permno)
                          
                          * minimum requiered recent obs in the last 12 obs
                          local threshold = 8
                          
                          
                          * variable related to the regressions
                          gen b = .
                          gen w = .
                          
                          quietly forval j = 1/1 {   
                          
                              * finding max and min of time for each firm j (min for all of them is 1)
                              su time if firm == `j', meanonly 
                          
                              forval T = `r(min)'/`r(max)' {
                                  count if exret < . & (time < `T') & (time >= `T'-12) & firm ==`j' 
                                  if r(N) < `threshold' continue   
                                      
                                  replace w = exp(-(`T' - time)* log(2)/60) if `T' > time & firm == `j'   
                          
                                  capture {
                                      regress exret inflation [aweight=w] if firm == `j'
                                      replace b = _b[inflation] if firm == `j' & time == `T'
                                      
                                  }
                              }
                          }

                          Comment


                          • #28
                            I'm sorry but this is not what I was hoping for. I wanted you to show how you would calculate results for individual observations. No loops, no local macros, no grouping of permno, no storage of results. Also, I didn't ask you to change the data example.

                            I want you to calculate results separately so that you can eyeball the data, in particular the weights you are creating with respect to a specific observation. In the example below, I generate results for observation 23 (time==30). I use the same model I used in my previous posts. I want you to see that you have 22 observations in the regression but that with your current threshold, you would reject the results. If this does not target the window you want, make changes and post the new code.

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input long(obs permno) float(time exret inflation)
                             1 1470874  1  .9122462 .51331943
                             2 1470874  2  .5948181  .3065938
                             3 1470874  3 .25645664  .4921637
                             4 1470874  4   .930954  .3914094
                             5 1470874  5  .9577237  .8338873
                             6 1470874  6  .5671181  .8355323
                             7 1470874  7  .2708475 .26582384
                             8 1470874  8 .34942105   .672545
                             9 1470874  9  .4345023  .5868091
                            10 1470874 10  .8107689 .23440893
                            11 1470874 11  .5516335 .29357833
                            12 1470874 12  .6561857  .9301422
                            13 1470874 13  .7671857 .04822804
                            14 1470874 14  .4452506 .09964211
                            15 1470874 15 .51762706  .8284212
                            16 1470874 16  .2166194 .13313739
                            17 1470874 17 .19157566  .3796502
                            18 1470874 18  .9459305  .7317557
                            19 1470874 19  .7275512  .0852162
                            20 1470874 20   .927551  .6083384
                            21 1470874 21  .4164995  .4719198
                            22 1470874 29         .  .4488523
                            23 1470874 30  .4370895  .1835875
                            24 6879153  1  .6476775  .3955551
                            25 6879153  2  .4472069 .02203239
                            end
                            
                            * weights to use in a regression for observation 23
                            gen double ww = exp(-(time[23] - time) * log(2)/60) if permno == permno[23]
                            
                            * regression for observation 23
                            regress exret inflation [aweight=ww] if permno == permno[23] & time <= time[23]
                            
                            * the number of non-missing values of exret for 12 most recent periods
                            count if !mi(exret) & permno == permno[23] & inrange(time, time[23]-11, time[23])

                            Comment


                            • #29
                              Oh, sorry for the misunderstanding. Bellow is modified line that I mean for observation 23, time 30:

                              Code:
                              * weights to use in a regression for observation 23
                              gen double ww = exp(-(time[23] - time) * log(2)/60) if permno == permno[23] & time < time[23]
                              
                              * regression for observation 23
                              regress exret inflation [aweight=ww] if permno == permno[23] & time <time[23]
                              
                              * the number of non-missing values of exret for 12 most recent periods
                              count if !mi(exret) & permno == permno[23] & inrange(time, time[23]-12, time[23]-1)

                              And I will reject this regression for observation 23, according to my logic.
                              Last edited by Mohammad Khodadadi; 18 May 2017, 12:08.

                              Comment


                              • #30
                                Alright, you seem intent on a recursive window that excludes the current observation but I would like to note that this does not seem to conform to the equations you posted in #4.

                                In terms of how to adjust the rangestat solution to do it this way, the weights are relative to the period of the current observation. If you want the results based on observations prior to the current observation, just use the lag of the results (assuming that there are no missing time periods). No other change is needed. Here's an example that demonstrates this:

                                Code:
                                * Example generated by -dataex-. To install: ssc install dataex
                                clear
                                input long(obs permno) float(time exret inflation)
                                 1 1470874  1  .9122462 .51331943
                                 2 1470874  2  .5948181  .3065938
                                 3 1470874  3 .25645664  .4921637
                                 4 1470874  4   .930954  .3914094
                                 5 1470874  5  .9577237  .8338873
                                 6 1470874  6  .5671181  .8355323
                                 7 1470874  7  .2708475 .26582384
                                 8 1470874  8 .34942105   .672545
                                 9 1470874  9  .4345023  .5868091
                                10 1470874 10  .8107689 .23440893
                                11 1470874 11  .5516335 .29357833
                                12 1470874 12  .6561857  .9301422
                                13 1470874 13  .7671857 .04822804
                                14 1470874 14  .4452506 .09964211
                                15 1470874 15 .51762706  .8284212
                                16 1470874 16  .2166194 .13313739
                                17 1470874 17 .19157566  .3796502
                                18 1470874 18  .9459305  .7317557
                                19 1470874 19  .7275512  .0852162
                                20 1470874 20   .927551  .6083384
                                21 1470874 21  .4164995  .4719198
                                22 1470874 22         .  .3700192
                                23 1470874 23         .  .2700196
                                24 1470874 24         .  .1845193
                                25 1470874 25         .  .4300197
                                26 1470874 26         .  .8103007
                                27 1470874 27         .  .0300368
                                28 1470874 28         .  .1964691
                                29 1470874 29         .  .4488523
                                30 1470874 30  .4370895  .1835875
                                31 6879153  1  .6476775  .3955551
                                32 6879153  2  .4472069 .02203239
                                end
                                
                                * verify that there are no duplicates or missing time periods
                                tsset permno time
                                by permno: assert time == L.time + 1 if _n > 1
                                
                                * loop, based on Mohammad's single case
                                gen b = .
                                gen n = .
                                local nobs = _N
                                qui forvalues i = 1/`nobs' {
                                    count if !mi(exret) & permno == permno[`i'] ///
                                        & inrange(time, time[`i']-12, time[`i']-1)
                                    if r(N) >= 8 {
                                        gen double ww = exp(-(time[`i'] - time) * log(2)/60) ///
                                            if permno == permno[`i'] & time < time[`i']
                                        regress exret inflation [aweight=ww] ///
                                            if permno == permno[`i'] & time <time[`i']
                                        drop ww
                                        replace n = e(N) in `i'
                                        replace b = _b[inflation] in `i'
                                    }
                                }
                                
                                
                                * --------------- redo using rangestat ----------------------------------------
                                
                                * precompute number of non-missing observations in the past 12 periods
                                rangestat (count) exret, interval(time -11 0) by(permno)
                                
                                * define a Mata function for Weighted Least Squares, original OLS Mata code derived from
                                * http://blog.stata.com/2016/01/12/programming-an-estimation-command-in-stata-an-ols-command-using-mata/
                                * and adapted using the information on
                                * https://www.mathworks.com/help/curvefit/least-squares-fitting.html?requestedDomain=www.mathworks.com
                                mata:
                                mata clear
                                real rowvector mywls(real matrix Xall)
                                {
                                    real colvector y, b, w, wy, t
                                    real matrix X, XpX, XpXi, wX
                                    real scalar n, k, tau
                                    
                                    y = Xall[.,1]                // dependent var is first
                                    X = Xall[.,2::cols(Xall)-1]  // independent variables
                                    t = Xall[.,cols(Xall)]         // time variable
                                    n = rows(X)                  // the number of observations
                                    X = X,J(n,1,1)               // add a constant
                                    k = cols(X)                      // number of independent variables
                                    
                                    if (n > k) {    // need more than k obs to estimate model and standard errors
                                        tau = t[rows(t)]
                                        w = exp(-(tau :- t) * log(2)/60)
                                        wX = w :* X
                                        wy = w :* y
                                        XpX  = quadcross(X, wX)
                                        XpXi = invsym(XpX)
                                        if (diag0cnt(XpXi) == 0) {
                                            b = XpXi * quadcross(X, wy)            
                                        }
                                    }
                                    
                                    return(rows(X), b')
                                }
                                end
                                
                                gen high = cond(exret_count >= 8, time, 0) 
                                rangestat (mywls) exret inflation time, interval(time . high) by(permno) casewise
                                
                                gen reg_n = L.mywls1
                                gen reg_b = L.mywls2
                                assert n == reg_n
                                assert b == reg_b

                                Comment

                                Working...
                                X