Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • r (2001) insufficient observations

    Dear all,

    I've a weird problem in looping regressions in a time series dataset. My sample is made of 94 stocks. For each stock I have 3 liquidity measures and its squared returns. I also have 12 aggregated market measures: 3 for each liquidity measure and the 3 remaining are fixed. I need to regress for each stock's liquidity measure the above variables on the form:
    y= b0 + b1ML1 + b2ML2 + b3ML3 + b4M4 + b5M5 + b6M6 + e
    I also need to store (eststo) the estimates. Unfortunately, some stocks' measures are very limited.

    My code is the following:
    Code:
    local N = 94
    forvalues i = 1/`N' {
    regress dqspr`i' Dav_qspr lagDav_qspr leadDav_qspr irelandind lagirelandind leadirelandind Dsqrtret`i'
    }
    eststo
    If I
    Code:
    qui regress
    Stata gives me the error 2001 "Insufficient observations" from the first regression, while if I
    Code:
    regress
    Stata gives me error 2001 from the 32nd regression, which is actually the first stock with limited observations (seven to be precise).
    How can I overcome this problem without regress manually? Is there any way to "ignore" regressions with only limited observations?

    Thanks for your help

    Stefano Grillini

  • #2
    Something like this:

    Code:
    local N = 94
    forvalues i = 1/`N' {
        capture regress dqspr`i' Dav_qspr lagDav_qspr leadDav_qspr irelandind ///
            lagirelandind leadirelandind Dsqrtret`i'
        if c(rc) == 0 {
            estssto results`i'
        }
        else if c(rc) == 2001 {
            display "Insufficient results for i == `i': moving on."
        }
        else {
            display "Unanticipated error in regression with i = `i'"
            exit `c(rc)'
        }
    }
    Notes:

    1. The -capture- command, in addition to blocking error conditions, also suppresses output. If you want to see the regression results, insert -noisily- between -capture- and -regress-.

    2. This code will give a pass to regressions with insufficient observations, and display a notification in each instance. But it will still break on any other unanticipated error condition.

    Comment


    • #3
      Thank you very much Clyde, it works (hopefully, as I have to do the same with another dataset with 4000 stocks).

      Best Wishes

      Stefano Grillini

      Comment


      • #4
        Hi all, eststo command can store maximum 300 estimates. As I'm now working on a greater dataset, do you have any tips about a way to overcome this limitation? (my code is the one above, but for more than 300 regressions)

        Thanks

        Stefano

        Comment


        • #5
          I was thinking about storing estimates using "statsby", as I then need to do further (simple) calculations with the results, so the CODE I thought is something like:
          Code:
          local N = 351
          forvalues i = 1/`N' {
          statsby _b e(r2) _se df=e(df_r), by(date) saving("N:\...\myreg1.dta", replace): regress ///
          dqspr`i' Dav_qspr lagDav_qspr leadDav_qspr irelandind lagirelandind leadirelandind Dsqrtret`i'
          gen t = _b/_se
          gen p = 2*ttail(df,abs(t))
          }
          As you can see, I need for each regression the coefficients, t-stat, p-value and r2 (actually I need the adjusted r2, but I don't know how to get it). if I run this command, I have two types of problems. Firstly, the system does not recognise _b. In addition, it runs one regression every day (date variable). To overcome at least the second limitation, I thought to generate a scalar variable with all "1" values to use as by group. Any other ideas, suggestions are appreciated.

          Thanks

          Stefano

          Comment


          • #6
            Well, to be honest, I'm always skeptical of projects that run hundreds of regression analyses and somehow try to synthesize the results. I'm inclined to believe that in the end, the results are either incomprehensible or grossly oversimplified to make them digestible. So maybe Stata's limit is a warning to not do things this way. But at this point, I won't pursue that issue and just credit you with having a reasonable plan.

            So, you won't be able to store these estimates all in memory at one time. -help limits- reveals that the maximum of 300 estimates stored in active memory is built into Stata and is not a quirk of -eststo-. But you can save the estimates to disk files instead. So if you replace -estssto results`i'- with -estimates save results`i'-, you will end up with a bunch of files called results1.ster through resultsBIGNUMBER.ster in your working directory. When you need to work with them, you can invoke the -estimates use- command.

            At the end of the day, though, if you have a plan that requires having thousands of estimates in memory at the same time, it isn't going to happen. Since the limit of 300 estimates in memory at once applies throughout Stata, if you need to work with more than 300 sets of estimates, you need to find a way to process them serially, or in batches of fewer than 300. (For example, maybe you don't really need the full estimation results for your end product. Maybe you just need specific statistics which can be pulled out of them one at a time and stored in local macros or in matrices, etc.)

            Comment


            • #7
              Well Clyde, I appreciate your comment and I personally agree with you. However, at this stage, I'm partly replicating another study, so the estimates from all these regressions are mainly used to construct tables. In my field, empirical finance, it does often happen that models are replicated for a considerable number of stocks in a market, as in this case. Generally, I agree with you regarding the increasing complexity of "sometimes meaningless" aggregated measures, but it is often necessary to have a broad overview.

              I'll try to follow your suggestion regarding the replacement of -eststo- with -estimates save-.

              In any case, do you think the alternative code, using statsby, is in this case a valid solution? If so, any ideas why it does not recognise _b?

              I really appreciated your comment

              Thanks

              Stefano

              Comment


              • #8
                Stefano wrote:
                ...(actually I need the adjusted r2, but I don't know how to get it)...
                Adjusted R2 is stored as -e(r2_a)-.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Thanks Carlo,

                  At least I solved this issue.

                  Stefano

                  Comment


                  • #10
                    Well, the approach using -statsby- can be made to work, but it requires a different data structure from what you have there. It is doing one regression for each date because that is exactly what you told Stata to do when you specified -by(date)-. But your original problem calls for doing 94 regressions based on 94 different pairs of outcome variable and predictor. So you need to get -statsby- to do that. That, in turn, requires going to long layout:

                    Code:
                    gen long obs_no = _n
                    reshape long dqspr Dsqrtret, i(obs_no) j(_j)
                    
                    statsby _b _se e(r2_a) e(df_r), saving(results, replace) by(_j): ///
                        regress dqspr Dav_qspr lagDav_qspr leadDav_qspr irelandind ///
                        lagirelandind leadirelandind Dsqrtret
                    should get you the results you want. The code you had about calculating t from _b and _se does not make sense because _b and _se do not exist: they are not variables in your data set. They are the stubs of a series of variables in the data set results that you are creating, but they are not accessible from within your data set. To do that and other calculations with the results you need next -use results, clear-. That will bring all your regression output into active memory. Then you can start generating new variables. You still won't get to write -gen t = _b/_se- because now there is a whole suite of _b* and _se* variables and you will have to specify which one you want (or construct a loop to get a t statistic for each variable, or whatever it is you need).

                    Comment


                    • #11
                      Wait, there's something that's confusing me. Your original request was a loop from 1 through 94. By #5, we're up to 351 iterations and you exceeded the limits of stored estimates. But I also notice that in #5 you are doing 351 iterations, and each iteration is a -statsby, by(date)-, which could involve any number of regressions, depending on how many different dates you have. In my response in #10, I just presumed that the -by(date)- was some kind of misunderstanding of how -statsby- works. You'll notice, by the way, that my response in #10 does not include any explicit loop: the iterating is done within -statsby-. But now I'm wondering if your problem is more complex than you originally made it appear. Do you want to iterate over both dates and 351 variables? If you do, then the -by(_j)- option to -statsby- should be -by(_j date)-. (Still no explicit loop needed.)

                      Comment


                      • #12
                        Actually your point is correct Clyde. I could get the information I was looking for with the loop and storing results with eststo (exactly the code you provided in #2. However, as I also specified, I need to replicate the same code for bigger samples (the 351 is not even the biggest). Here is when I come to an end, because eststo cannot store all these estimates. I just thought as statsby as an alternative to eststo, which seems to be impossible to overcome with so many estimates.

                        You are also right in #11, where you say
                        each iteration is a -statsby, by(date)-, which could involve any number of regressions, depending on how many different dates you have
                        In fact, Stata runs a regression for each date, with obviously one observation for each variable, which is statistically wrong and meaningless. I could overcome this issue, creating a new variable:
                        Code:
                        generate group = _n
                        replace group = [1]
                        This solves the problem, as it runs one regression for the whole time period. However, the command substitutes the last estimates to the first row in the new .dta file. So if I have 351 regressions, in the new .dta file I find only estimation for the 351st, as it iteratively replaces previous estimations.

                        Thanks

                        Stefano

                        Comment


                        • #13
                          I can't follow what you're describing. I think you need to show a small representative sample of your data (please use -dataex- to do that)*, and then a hand-worked example of what you want the results to be. At this point, I don't understand what regressions you want to do.

                          *If you do not already have the -dataex- command, you get it by running -ssc install dataex-. Then read -help dataex- for the simple instructions on how to use it.

                          Comment


                          • #14
                            Dear Clyde, I copied a small sample of the dataset for only the last observations of the time series and for only five stocks.

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input int date float(dqspr1 dqspr2 dqspr3 dqspr4 dqspr5 Dav_qspr lagDav_qspr leadDav_qspr irelandind lagirelandind leadirelandind Dsqrtret1 Dsqrtret2 Dsqrtret3 Dsqrtret4 Dsqrtret5)
                            20426            .5   .25000006          .1            3           1     .14946306    -.05095674             .    -.3014796    -1.159085            .    .08694334    -.9833107    .3165502    -.4738849   -.9660669
                            20429     -.3333333         -.4   -.7272727    -.1666667           0    -.08122023             .      .1716124      2.91223            .    -2.595253            .    212.88423   15.291967     .2820318 -.011661846
                            20430             0   2.0000002    3.111111           .4         -.5      .1716124    -.08122023   -.005136395    -2.595253      2.91223   -1.3584417            .     -.920926   -.8880337    1.1364436    19.67324
                            20431            .5    .5555555    .3783783    -.7857143           0   -.005136395      .1716124    -.06889795   -1.3584417    -2.595253   -1.4172257    -.5012603     .6805121   2.2384634    -.9792513           .
                            20432             0   -.7857143   -.8431373    1.6666666           1    -.06889795   -.005136395    -.10478014   -1.4172257   -1.3584417   -1.6713814    -.7312129    -.9875518   -.4125474     6.902411           .
                            20433     -.3333333           1         4.5        -.875           0    -.10478014    -.06889795             .   -1.6713814   -1.4172257            .    3.3150625    35.187714   -.6777869     10.07242    8.054161
                            20436           -.5   1.8333334  -.20454547            3         -.5    .000213613             .    -.14328423   -2.0021513            .       .89406     1.570544    -.8879591   -.8308975    -.9980708           0
                            20437             1   -.7058824  -.25714287            3           1    -.14328423    .000213613      -.216351       .89406   -2.0021513    .29542577   -.54340386    117.52133   28.039536     153.7277 -.017751353
                            20438            .5         -.4   -.2307692         -.75         -.5      -.216351    -.14328423     .11927483    .29542577       .89406     .3793121    -.7568971     -.880347    -.995514     2.008746   -.8895476
                            20439      .3333333    .3333333         -.8    .50000006           2     .11927483      -.216351      .4606423     .3793121    .29542577    -.7395194     7.534437     .8116891    475.6776    -.2791414  .005952462
                            20440           .25        2.75        -.25    -.1666667   -.3333333      .4606423     .11927483             .    -.7395194     .3793121            .    -.8999519     2.281612  -.10682856    -.8475372    35.76267
                            20443          -.48         -.4    3.333333         10.2           0     .25564197             .     -.3662845   -4.4447746            .    1.6510344     40.20555    -.6940711   -.7143504     5.478612   -.8875475
                            20444      .2307693    .5555555     .923077    -.9464286         1.5     -.3662845     .25564197    -.14107579    1.6510344   -4.4447746     .5135787    -.9785348    -.7191756   -.5166548    -.9145004    10.92438
                            20445    -.26375002   -.7857143        -.96            1         -.6    -.14107579     -.3662845      .1520068     .5135787    1.6510344   -1.8510345     1.390468     -.220939   1.4038823            0   -.8216918
                            20446     -.5925297   1.6666666          70    -.8333333         -.5      .1520068    -.14107579             .   -1.8510345     .5135787            .    -.3154395     2.252932   -.9268817     .3978528   -.5620139
                            20447             .           .           .            .           .             .      .1520068             .            .   -1.8510345            .            .            .           .            .           .
                            20450             .           .           .            .           .             .             .             .            .            .   -1.5234817            .            .           .            .           .
                            20451             .           .           .            .           .             .             .      .6462319   -1.5234817            .    2.8659816            .            .           .            .           .
                            20452     -.3162393         2.5        18.4    -.8333333           0      .6462319             .    -.11682374    2.8659816   -1.5234817    .08692798            .     -.929072   -.8248055            .     8.10583
                            20453           .35   -.2142857   -.7783505           12   -.3333333    -.11682374      .6462319             .    .08692798    2.8659816            .            .    22.306507   1.1994845            .  -.54897934
                            end
                            format %tdnn/dd/CCYY date
                            Using the command discussed above for the five stocks in the sample:

                            Code:
                            local N = 5
                            forvalues i = 1/`N' {
                                capture regress dqspr`i' Dav_qspr lagDav_qspr leadDav_qspr irelandind lagirelandind leadirelandind Dsqrtret`i'
                                if c(rc) == 0 {
                                    eststo results`i'
                                }
                                else if c(rc) == 2001 {
                                    display "Insufficient results for i == `i': moving on."
                                }
                             else if c(rc) == 2000 {
                                    display "Insufficient results for i == `i': moving on."
                                }
                                else {
                                    display "Unanticipated error in regression with i = `i'"
                                    exit `c(rc)'
                                }
                            }
                            
                            esttab using "N:\...irelandmodel1.csv", ar2
                            eststo clear
                            I obtain exactly what I want. The total sample in this case is made of 94 stocks (local N = 94), so I don't have any problem in using eststo.
                            However, the present analysis has to be replicated for other 3 markets, which contain more than 400 stocks (actually one of them has 3000 stocks). Here is the problem, as eststo does not store all these results, so I need to find an alternative way to do this.

                            All the estimates obtained are then used to construct a table indicating:
                            - How many positive coefficients;
                            - How many coefficients are significant;
                            - What's the average ar2;
                            - and so on.

                            So. as you can see these estimates can be also done in Stata, so I believe it is not necessary to store them in an Excel file. That's why I thought about statsby.

                            Hope this clarifies the issue

                            Thanks

                            Stefano

                            Comment


                            • #15
                              Yes, -statsby- is your friend here. The following code will get you all of the coefficients and p-values.

                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input int date float(dqspr1 dqspr2 dqspr3 dqspr4 dqspr5 Dav_qspr lagDav_qspr leadDav_qspr irelandind lagirelandind leadirelandind Dsqrtret1 Dsqrtret2 Dsqrtret3 Dsqrtret4 Dsqrtret5)
                              20426            .5   .25000006          .1            3           1     .14946306    -.05095674             .    -.3014796    -1.159085            .    .08694334    -.9833107    .3165502    -.4738849   -.9660669
                              20429     -.3333333         -.4   -.7272727    -.1666667           0    -.08122023             .      .1716124      2.91223            .    -2.595253            .    212.88423   15.291967     .2820318 -.011661846
                              20430             0   2.0000002    3.111111           .4         -.5      .1716124    -.08122023   -.005136395    -2.595253      2.91223   -1.3584417            .     -.920926   -.8880337    1.1364436    19.67324
                              20431            .5    .5555555    .3783783    -.7857143           0   -.005136395      .1716124    -.06889795   -1.3584417    -2.595253   -1.4172257    -.5012603     .6805121   2.2384634    -.9792513           .
                              20432             0   -.7857143   -.8431373    1.6666666           1    -.06889795   -.005136395    -.10478014   -1.4172257   -1.3584417   -1.6713814    -.7312129    -.9875518   -.4125474     6.902411           .
                              20433     -.3333333           1         4.5        -.875           0    -.10478014    -.06889795             .   -1.6713814   -1.4172257            .    3.3150625    35.187714   -.6777869     10.07242    8.054161
                              20436           -.5   1.8333334  -.20454547            3         -.5    .000213613             .    -.14328423   -2.0021513            .       .89406     1.570544    -.8879591   -.8308975    -.9980708           0
                              20437             1   -.7058824  -.25714287            3           1    -.14328423    .000213613      -.216351       .89406   -2.0021513    .29542577   -.54340386    117.52133   28.039536     153.7277 -.017751353
                              20438            .5         -.4   -.2307692         -.75         -.5      -.216351    -.14328423     .11927483    .29542577       .89406     .3793121    -.7568971     -.880347    -.995514     2.008746   -.8895476
                              20439      .3333333    .3333333         -.8    .50000006           2     .11927483      -.216351      .4606423     .3793121    .29542577    -.7395194     7.534437     .8116891    475.6776    -.2791414  .005952462
                              20440           .25        2.75        -.25    -.1666667   -.3333333      .4606423     .11927483             .    -.7395194     .3793121            .    -.8999519     2.281612  -.10682856    -.8475372    35.76267
                              20443          -.48         -.4    3.333333         10.2           0     .25564197             .     -.3662845   -4.4447746            .    1.6510344     40.20555    -.6940711   -.7143504     5.478612   -.8875475
                              20444      .2307693    .5555555     .923077    -.9464286         1.5     -.3662845     .25564197    -.14107579    1.6510344   -4.4447746     .5135787    -.9785348    -.7191756   -.5166548    -.9145004    10.92438
                              20445    -.26375002   -.7857143        -.96            1         -.6    -.14107579     -.3662845      .1520068     .5135787    1.6510344   -1.8510345     1.390468     -.220939   1.4038823            0   -.8216918
                              20446     -.5925297   1.6666666          70    -.8333333         -.5      .1520068    -.14107579             .   -1.8510345     .5135787            .    -.3154395     2.252932   -.9268817     .3978528   -.5620139
                              20447             .           .           .            .           .             .      .1520068             .            .   -1.8510345            .            .            .           .            .           .
                              20450             .           .           .            .           .             .             .             .            .            .   -1.5234817            .            .           .            .           .
                              20451             .           .           .            .           .             .             .      .6462319   -1.5234817            .    2.8659816            .            .           .            .           .
                              20452     -.3162393         2.5        18.4    -.8333333           0      .6462319             .    -.11682374    2.8659816   -1.5234817    .08692798            .     -.929072   -.8248055            .     8.10583
                              20453           .35   -.2142857   -.7783505           12   -.3333333    -.11682374      .6462319             .    .08692798    2.8659816            .            .    22.306507   1.1994845            .  -.54897934
                              end
                              format %tdnn/dd/CCYY date
                              
                              isid date
                              reshape long dqspr Dsqrtret, i(date) j(_j)
                              tempfile results
                              statsby _b _se e(r2_a) e(df_r), saving(`results') by(_j): regress dqspr Dav_qspr lagDav_qspr leadDav_qspr ///
                                  irelandind lagirelandind leadirelandind Dsqrtret
                                  
                              use `results', clear
                              rename _eq2_stat_1 adjusted_r2
                              rename _eq2_stat_2 df_r
                              ds _b*
                              local predictors `r(varlist)'
                              local predictors: subinstr local predictors "_b_" "", all
                              foreach p of local predictors {
                                  gen t_`p' = _b_`p'/_se_`p'
                                  gen p_`p' = 2*ttail(df_r, abs(t_`p'))
                              }
                              As I do not understand what you mean by "how many coefficients are positive" and "how many are significant" I will leave it to you to take it from here. It is likely that -egen- functions will play a role in finishing the job, though not knowing where we are going here, I can't be more specific than that.

                              Note that your example data, while very useful for setting up the code (and I thank you for it), does not contain sufficiently many observations with non-missing values to actually calculate standard errors for your coefficients (note that df_r is always zero), but with a realistic size data set that should not be a problem.

                              Note also that this code assumes that variable date uniquely identifies observations in your data. If that is not the case, you will have to create a unique identifier for observations and use that variable, rather than date, in the -i()- option of the -reshape- command.

                              So. as you can see these estimates can be also done in Stata, so I believe it is not necessary to store them in an Excel file.
                              In my very strongly held opinion Excel should NEVER (is that shouted loud enough?) play any intermediate role in data analysis. Excel should be used only to send final results and receive original data sets from other people. Data analysis should not include any steps that involve Excel along the way because you have no way assuring the integrity of data in an Excel file and it leaves no audit trail of any modifications or calculations made in it. Reserve Excel for the beginning and the end only (and even that only if your colleagues prefer it.)

                              Comment

                              Working...
                              X