Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Run more than one regression with same independent variables

    Hello all,

    For my thesis I have to run several regressions with certain factors as independent variables (F&F 5-factor model). However, I need to run around 1400 regressions on the same factors, as I have that many different dependent variables. Here follows an example of my data as it is ordered now. All numbers are returns.
    Date Mkt-rf SMB HML RMW CMA Bank A Bank B Bank C
    31-01-1986 0,020 0,11 0,34 0,004 0,33 . 0,045 .
    28-02-1986 0,012 0,03 0,04 0,45 0,01 . -0,09 .
    31-03-1986 -0,01 -0,12 0,55 0,04 0,04 . -0,07 0,07
    30-04-1986 0,01 0,035 -0,09 0,03 0,04 . . 0,09
    31-05-1986 0,003 0,001 0,004 0,01 0,02 0,045 . 0,05
    30-06-1986 0,002 0,11 -0,56 -0,09 -0,01 0,022 . 0,07
    31-07-1986 0,02 -0,02 -0,04 -0,03 -0,07 0,045 . -0,08
    31-08-1986 -0,004 0,01 -0,09 -0,03 0,10 0,03 . -0,09
    30-09-1986 -0,1 0,22 0,06 -0,02 0,04 0,008 . -0.01
    NOTE: these numbers are fictional, the example is just to show what I mean. The factors have returns at all dates, while Bank A for example can have 25 returns, and Bank B 50.

    Want I need is something like this:
    reg Bank A Mkt-rf SMB HML RMW CMA
    reg Bank B Mkt-rf SMB HML RMW CMA

    However, to run these manually would take simply too much time but I just can't figure out how I can run more than one regression at the same time. Is there an alternative?

    This question has my priority, but if someone also knows how to easily import the beta's and p-values of the regressions that would also be nice.

    I'm using STATA version 12. I know I demand a lot, but hopefully someone can help me with this.

    Best regards,
    Ernest Adler

  • #2
    "Bank A", "Bank B", "Bank C" and "Mkt-rf" are not valid variable names so let's assume that your variables are called Bank_A, Bank_B, Bank_C and Mtk_rf. Assume further that all "Bank" variables are next to each other in your dataset so that we can get a list with ds. To run more than one regression with one command you can loop over the dependent variables.
    Code:
    ds Bank_A - Bank_C
    local banks `r(varlist)'
    foreach bank of local banks {
      di "Bank: `bank'"
      reg `bank' Mkt_rf SMB HML RMW CMA
    }

    Comment


    • #3
      Friedrich's solution does not respond to the original poster's request of "how to easily import the beta's and p-values of the regressions that would also be nice." Also, as the original post states that there are 1400 different banks, the output from that loop of regressions will be quite unwieldy.

      With 1400 outcomes (all of them banks?), a naming scheme like Bank_A, Bank _B, etc. will also be unwieldy. So I'm going to assume that Ernest will rename these variables to Bank1 through Bank1400. In referring to the regression p-values, it isn't clear whether Ernest wants the p-values associated with each coefficient in each regression, or the overall p-values for each regression model. With that done, I would recommend the following:

      Code:
      gen long obsno = _n
      reshape long Bank, i(obsno) j(bank_num)
      rename Bank outcome
      statsby _b _se e(F) e(df_m) e(df_r), saving(regression_results, replace) by(bank_num): reg outcome Mkt_rf SMB HML RMW CMA
      The file regression_results.dta will now contain all of the coefficients, their standard errors, the F statistic, and the model and residual degrees of freedom for each model. (The last three will have names that look like _eq2_stat1 through _eq2_stat3: I recommend renaming them immediately.) Ernest can then calculate the p-values for the overall regression using the Ftail() function, and those for the individual variables using the ttail() function.


      Comment


      • #4
        First of all, thanks for replying to my questions.

        I see now that I was a little bit careless in my opening post. The Mkt-rf variable in reality is MktRF. Furthermore, the banks all have their original names (so I wrote Bank A, which in reality is AMERICANPIONEER for example). Having said that, I tried Friedrich's solution first. This solution works, but I only can check the last 20 regressions or something like that (probably because there are too much regressions ran at once?). Also, as Clyde mentioned, it is harder to copy all the necessary outcomes this way.

        The outcomes that I need are the coefficients of all the variables, their std. error, the t-values and p-values. I also need the r-squares and the F-value and p-value of the models. I tried Clyde's solution, but I can't get that one to work.

        Sorry for my beginner skills. Again, thanks for replying, I appreciate the help from both of you.

        Comment


        • #5
          I tried Clyde's solution, but I can't get that one to work.
          If the problem is that your variable names, being bank names, are unsystematic, so you can't easily make them look like Bank1-Bank1400, then try this (I assume your bank-name variables are all together at the end of your data set, and there is nothing else interspersed among them. Let's say the first one is AMERICANPIONEER and the last one is ZEROBANK) :
          Code:
          unab bank_names: AMERICANPIONEER-ZEROBANK
          rename (`bank_names') Bank#, addnumber
          That will convert the names to the form Bank1-Bank1400 needed for my code in #3.

          If that isn't the problem (or if it still doesn't work after you do this), please show exactly what code you ran and exactly how Stata responded by pasting (do not retype anything) directly from your Results window or log-file into a code block here on the forum. (See the FAQ for how to create a code block.) I'll try to figure out how to fix it for you.

          By the way, you didn't' originally mention you needed the R2 statistics. You can capture those by adding e(r2) to the list of parameters in the -statsby- command.
          Last edited by Clyde Schechter; 28 May 2015, 14:58.

          Comment


          • #6
            You can store the results with the regsave module from SSC.
            Code:
            ssc install regsave
            The example below uses the auto data to demonstrate how this works.
            Code:
            sysuse auto, clear
            ren price MktRF
            ren mpg SMB
            ren rep78 HML
            ren headroom RMW
            ren trunk CMA
            ren weight JPMorgan_Chase
            ren length Bank_of_America
            ren turn Citigroup
            
            ds JPMorgan_Chase - Citigroup
            local banks `r(varlist)'
            local i = 0
            foreach bank of local banks {
              local i = `i' + 1
              di "Bank: `bank'"
              reg `bank' MktRF SMB HML RMW CMA
              local f = e(F)
              local p =  Ftail(`e(df_m)',`e(df_r)',`e(F)')
              if `i' == 1 {
                regsave using results.dta, tstat pval table(`bank') addlabel(F, `f', p, `p') replace
              }
              else {
                regsave using results.dta, tstat pval table(`bank') addlabel(F, `f', p, `p') append
              }
            }
            The file results.dta contains these variables:
            Code:
            +-----------------------------------------------------------+
            |          var   JPMorgan_C~e   Bank_of_Am~a      Citigroup |
            |-----------------------------------------------------------|
            |   MktRF_coef    .0705303177    .0007755397    .0000657132 |
            | MktRF_stderr    .0191870164    .0005565662    .0001337312 |
            |  MktRF_tstat    3.675939798     1.39343667    .4913825989 |
            |   MktRF_pval    .0004916819    .1683845222    .6248634458 |
            |     SMB_coef   -59.57649612   -1.868157387   -.3305487931 |
            |   SMB_stderr    11.85547352    .3438969553    .0826312527 |
            |    SMB_tstat   -5.025231361   -5.432317257    -4.00028801 |
            |     SMB_pval    4.42246e-06    9.54228e-07    .0001688887 |
            |     HML_coef   -138.7820282   -2.313806534    -1.22577095 |
            |   HML_stderr    54.31847763    1.575639844    .3785933256 |
            |    HML_tstat   -2.554969072   -1.468486905   -3.237698317 |
            |     HML_pval     .013047494    .1469475925    .0019236589 |
            |     RMW_coef      80.710289    1.783444047    .2113008946 |
            |   RMW_stderr    75.52352142    2.190743923    .5263900161 |
            |    RMW_tstat    1.068677545      .81408149    .4014150798 |
            |     RMW_pval    .2892927527    .4186647236    .6894729733 |
            |     CMA_coef    44.75694656    1.891320944    .2698600888 |
            |   CMA_stderr     16.7351799     .485444665    .1166422144 |
            |    CMA_tstat    2.674422741    3.896058798    2.313571453 |
            |     CMA_pval    .0095242811    .0002392972    .0239671711 |
            |   _cons_coef    3474.100098    199.4847412    46.21289825 |
            | _cons_stderr    421.6842651    12.23198032    2.939089298 |
            |  _cons_tstat    8.238628387    16.30845833    15.72354317 |
            |   _cons_pval    1.36572e-11    2.71718e-24    1.73519e-23 |
            |            N             69             69             69 |
            |           r2    .7720876932    .7670263648    .6471169591 |
            |            F    42.68441391    41.48337173    23.10588074 |
            |            p    5.70477e-19    1.12876e-18    4.23521e-13 |
            +-----------------------------------------------------------+
            Edit 1: Table showing results.dta replaced by output of list, noobs sep(0).
            Edit 2: p-value for F-test added to code and results.
            Last edited by Friedrich Huebler; 28 May 2015, 15:37.

            Comment


            • #7
              For some reason the regression results are saved as strings in the example above. To convert the data to numeric variables, run the commands below.
              Code:
              use results, clear
              destring JPMorgan_Chase - Citigroup, replace
              Finally, you can rearrange the data to have one observation per bank.
              Code:
              count
              local n = `r(N)'
              forval i = 1/`n' {
                local v`i' = var[`i']
              }
              drop var
              xpose, clear varname
              forval i = 1/`n' {
                ren v`i' `v`i''
              }
              ren _varname bank
              order bank
              Here is a subset of the variables after xpose.
              Code:
              list bank - MktRF_pval N - p, noobs sep(0)
              
              +---------------------------------------------------------------------------------------------------+
              |            bank   MktRF_~f   MktRF_~r   MktRF_~t   MktRF_~l    N         r2          F          p |
              |---------------------------------------------------------------------------------------------------|
              |  JPMorgan_Chase   .0705303    .019187    3.67594   .0004917   69   .7720877   42.68441   5.70e-19 |
              | Bank_of_America   .0007755   .0005566   1.393437   .1683845   69   .7670264   41.48337   1.13e-18 |
              |       Citigroup   .0000657   .0001337   .4913826   .6248634   69    .647117   23.10588   4.24e-13 |
              +---------------------------------------------------------------------------------------------------+

              Comment


              • #8
                I have tried Friedrich's codes and it works like a charm, this is exactly what I needed. Thank you so much for your help, with these codes I can make much progression in a short amount of time. Clyde, your help is also really appreciated. Thanks for all the replies, you two really have helped me alot.

                Comment


                • #9
                  Originally posted by Friedrich Huebler View Post
                  For some reason the regression results are saved as strings in the example above. To convert the data to numeric variables, run the commands below.
                  Code:
                  use results, clear
                  destring JPMorgan_Chase - Citigroup, replace
                  The previous version of regsave had a bug: when the addlabel() option was used the results were saved as strings. The author of regsave submitted an updated version to SSC (distribution date 20150530) that no longer has this bug. With the new version it is not necessary to destring the results because they are saved as numeric variables.

                  Comment


                  • #10
                    Dear all,

                    I have a similar problem but I need to perform weighted regressions where weights are variables, say w_JPMorgan_Chase - w_Citigroup.

                    Could someone please help me rearrange the code to include weights for each regression?

                    Thanks a lot!

                    Comment

                    Working...
                    X