Calculating the Number of times a significant result occurred

Sagnik Bagchi

Join Date: Mar 2015

Posts: 26
#1

Calculating the Number of times a significant result occurred

17 Aug 2019, 01:55

Hello,

I am using the following command "bysort var: reg y x". In doing so, I get many results.

However, I am only interested to know whether the regression coefficient of x is statistically significant or not (less than equal to 10% level of significance). Is there any command in STATA that would tell me the number of times the regression coefficient is statistically significant for all the output that I get after using the "by sort var" command?

Many thanks in advance!

Sagnik
Tags: regression, Suggestion, syntax
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#2

17 Aug 2019, 03:09

You can't do this with -by-. You need a loop and you need to extract the p-value from the results stored by regress. Something like this:

Code:

levelsof var, local(var_levels) local counter = 0 foreach v of local var_levels { regress y x if var == `v' matrix M = r(table) if M[4, 1] < 0.05 { local ++counter } } display `counter'

That said, you should not do this. The American Statistical Association has recommended that the use of statistical significance testing be abandoned. Read https://www.tandfonline.com/doi/full...5.2019.1583913. If it is meaningless to talk about whether a result is "statistically significant," it is particularly abusive to count up the number of times it happens.
2 likes
Comment
Sagnik Bagchi

Join Date: Mar 2015

Posts: 26
#3

17 Aug 2019, 03:26

Thank you for this code. of course, the link of the paper that you have mentioned.

However, after executing the code (ctrl+D), I am not able to see the number of times the coefficient are statistically significant. Could you help me with that?
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30089

17 Aug 2019, 03:40

When I run this code, it works and produces a correct result, -display-ed at the end:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. rename rep78 var

. rename price y

. rename mpg x

.
. levelsof var, local(var_levels)
1 2 3 4 5

. local counter = 0

. foreach v of local var_levels {
  2.     regress y x if var == `v'
  3.     matrix M = r(table)
  4.     if M[4, 1] < 0.05 {
  5.         local ++counter
  6.     }
  7. }

      Source |       SS           df       MS      Number of obs   =         2
-------------+----------------------------------   F(1, 0)         =         .
       Model |    273060.5         1    273060.5   Prob > F        =         .
    Residual |           0         0           .   R-squared       =    1.0000
-------------+----------------------------------   Adj R-squared   =         .
       Total |    273060.5         1    273060.5   Root MSE        =         0

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -123.1667          .        .       .            .           .
       _cons |       7151          .        .       .            .           .
------------------------------------------------------------------------------

      Source |       SS           df       MS      Number of obs   =         8
-------------+----------------------------------   F(1, 6)         =      4.69
       Model |    39346087         1    39346087   Prob > F        =    0.0735
    Residual |  50336476.9         6  8389412.82   R-squared       =    0.4387
-------------+----------------------------------   Adj R-squared   =    0.3452
       Total |  89682563.9         7  12811794.8   Root MSE        =    2896.4

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   -630.823    291.288    -2.17   0.074    -1343.579    81.93295
       _cons |   18032.12   5664.222     3.18   0.019     4172.264    31891.97
------------------------------------------------------------------------------

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(1, 28)        =     14.65
       Model |   123788982         1   123788982   Prob > F        =    0.0007
    Residual |   236582733        28  8449383.33   R-squared       =    0.3435
-------------+----------------------------------   Adj R-squared   =    0.3201
       Total |   360371715        29  12426610.9   Root MSE        =    2906.8

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -498.8875   130.3389    -3.83   0.001    -765.8747   -231.9003
       _cons |   16124.28    2587.92     6.23   0.000     10823.17    21425.39
------------------------------------------------------------------------------

      Source |       SS           df       MS      Number of obs   =        18
-------------+----------------------------------   F(1, 16)        =      0.90
       Model |  2643203.86         1  2643203.86   Prob > F        =    0.3572
    Residual |  47043724.6        16  2940232.79   R-squared       =    0.0532
-------------+----------------------------------   Adj R-squared   =   -0.0060
       Total |  49686928.5        17   2922760.5   Root MSE        =    1714.7

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -79.90338   84.27343    -0.95   0.357    -258.5551    98.74832
       _cons |    7802.74   1870.119     4.17   0.001     3838.264    11767.22
------------------------------------------------------------------------------

      Source |       SS           df       MS      Number of obs   =        11
-------------+----------------------------------   F(1, 9)         =      7.88
       Model |  31950588.4         1  31950588.4   Prob > F        =    0.0204
    Residual |  36471559.6         9  4052395.52   R-squared       =    0.4670
-------------+----------------------------------   Adj R-squared   =    0.4077
       Total |    68422148        10   6842214.8   Root MSE        =    2013.1

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -204.6947   72.89925    -2.81   0.020    -369.6042   -39.78513
       _cons |   11514.19   2085.085     5.52   0.000       6797.4    16230.98
------------------------------------------------------------------------------

. display `counter'
2

If it is not working for you, please post example data so I can troubleshoot it. Be sure to use the -dataex- command to show your example data.

Comment

Sagnik Bagchi

Join Date: Mar 2015

Posts: 26
#5

19 Aug 2019, 04:28

Yes, it did worked.

Further, I would like to know is there any way that would tell me that how many times the estimated coefficient has a positive and/or negative sign and is statistically significant for the set of regressions.

thanks,

Last edited by Sagnik Bagchi; 19 Aug 2019, 04:31.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30089

19 Aug 2019, 07:30

Code:

levelsof var, local(var_levels)
local counter = 0
local counter_pos = 0
local counter_neg = 0
foreach v of local var_levels {
    regress y x if var == `v'
    matrix M = r(table)
    if M[4, 1] < 0.05 {
        local ++counter
        if _b[x] < 0 {
            local ++counter_neg
        }
        else {
            local ++counter_pos
        }
    }
}
display as text "Number of positive significant: " as result `counter_pos'
display as text "Number of negative significant: " as result `counter_neg'
display as text "Total of significant results: " as result `counter'"

Comment

Nikifor Naumov

Join Date: Jun 2022
Posts: 9

11 Jul 2022, 14:25

Hello Clyde, I am trying to use your code for my dataset where I have panel data with 3816 mutual funds and I am trying to run a regression for each fund. The coefficient of interest is that of mktrf2. After running the code I receive an error ''insufficient observations''. Knowing that the observations are actually enough for the regression where am I making a mistake?
Thank you in advance

Code:

. levelsof fundnr, local(var_levels). 
. local counter = 0

. 
. local counter_pos = 0

. 
. local counter_neg = 0

. 
. foreach v of local var_levels {
  2. 
.     xtreg fundperf mktrf mktrf2 if fundnr == `v'
  3. 
.     matrix M = r(table)
  4. 
.     if M[4, 1] < 0.05 {
  5. 
.         local ++counter
  6. 
.         if _b[mktrf2] < 0 {
  7. 
.             local ++counter_neg
  8. 
.         }
  9. 
.         else {
 10. 
.             local ++counter_pos
 11. 
.         }
 12. 
.     }
 13. 
. }

Output:

Code:

insufficient observations
r(2001);

.
. display as text "Number of positive significant: " as result `counter_pos'
Number of positive significant: 0

.
. display as text "Number of negative significant: " as result `counter_neg'
Number of negative significant: 0

.
. display as text "Total of significant results: " as result `counter'"
Total of significant results: too few quotes
r(132);

Last edited by Nikifor Naumov; 11 Jul 2022, 14:27.

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

11 Jul 2022, 15:02

Most probably it is just what the error message says -- for some funds you do not have enough obervations. Count how many observations you have by fund, and decide what to do in the cases when there are not enough observations. This can give you the observations:

Code:

egen observationsbyfund = count(fundperf +mktrf + mktrf2), by(fundnr) summ observationsbyfund, detail
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#9

11 Jul 2022, 15:17

Knowing that the observations are actually enough for the regression where am I making a mistake?

Your mistake is in believing that you know something that, apparently, isn't true. I have never known Stata to be wrong when it says there are insufficient observations for a regression.

Bear in mind that when a regression is carried out, any observation that contains a missing value on any variable mentioned in the regression command is excluded. So only complete cases count. Stata is telling you that there is has encountered a fundnr for which as a result of incomplete data it cannot do the regression requested. I suggest you revise your code to identify these fundnr's, and then move on to the next one, and then you can investigate the problematic fundnr's afterward.

Code:

levelsof fundnr, local(var_levels). local counter = 0 local counter_pos = 0 local counter_neg = 0 local bad_levels foreach v of local var_levels { capture noisly xtreg fundperf mktrf mktrf2 if fundnr == `v' if c(rc) == 0 { matrix M = r(table) if M[4, 1] < 0.05 { local ++counter if _b[mktrf2] < 0 { local ++counter_neg } else { local ++counter_pos } } } else if inlist(c(rc), 2000, 2001) { local bad_levels `bad_levels' `v' } else { display in red "Unanticipated error at fundnr == `v' -- Execution terminated" exit c(rc) } } display as text "fundnr's with insufficient data: " as result `"`bad_levels'"'

Concerning the "too few quotes" error message from your final command, the problem is actually, in this case, too many quotes. The one after `counter' should be removed.

Now, I have alarm bells going off in my head about the regression itself. I have the terrible feeling that the variable mktrf2 is the square of mktrf. If that is correct, then testing for statistical significance of the coefficient of mktrf, as you are doing by looking at M[4,1], is completely and utterly meaningless. If you are thinking it will tell you whether there is a statistically significant association between mktrf and fundperf, that is absolutely mistaken. When you have a linear and quadratic term in the model, it is meaningless to test either in isolation. You must test them jointly. The statistical significance of the coefficient of mktrf alone tells you nothing useful: it only tells you that if there is a U or inverse-U relationship, then the data are not very compatible with its turning point being located at mktrf = 0.

Added: Crossed with #8.

Last edited by Clyde Schechter; 11 Jul 2022, 15:24.
1 like
Comment
Nikifor Naumov

Join Date: Jun 2022

Posts: 9
#10

12 Jul 2022, 04:46

Thank you both Clyde and Joro. I have fixed my problem, although I need the significant results of mktrf2, not mktrf. As you mention above, by looking at Matrix M[4,1] I am testing for the significance of the coefficient of mktrf. How should I change it to give me the significant results of mktrf2?
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#11

12 Jul 2022, 06:13

You don't show the code you used, but either way, my point is really the first point that Clyde made above, that is, abandon ship. Turn away. Why are you specifically searching for "significant" results when the American Statistical Association advises against this practice? You shouldn't base your analysis on the arbitrary threshold that's been defined, so even if we get you the code to test for the significance of a coefficient, why would you want to do this is my question?
Comment

Nikifor Naumov

Join Date: Jun 2022
Posts: 9

#12

12 Jul 2022, 07:24

Hello Jared, I am using a model developed by Treynor and Mazuy which evaluates the fund managers' market-timing ability.

Code:

levelsof fundnr, local(var_levels)
local counter = 0
local counter_pos = 0
local counter_neg = 0
foreach v of local var_levels {
       xtreg mktrf mktrf2 if fundnr == `v', fe
       matrix M = r(table)
       if M[4, 1] < 0.05 {
             local ++counter
             if _b[mktrf2] < 0 {  
                   local ++counter_neg
             }
             else {  
                    local ++counter_pos
             }
       }
}
display as text "Number of positive significant: " as result `counter_pos'
display as text "Number of negative significant: " as result `counter_neg'
display as text "Total of significant results: " as result `counter'

Unfortunately, the output gives me the number of times the first coefficient (mkrtf) was significant, and not the second (mktrf2).

Last edited by Nikifor Naumov; 12 Jul 2022, 08:08.

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#13

12 Jul 2022, 08:39

This doesn't address my question though, why do you want to do this when good statistics practice suggests otherwise? why is the number of times a variable is significant a relevant metric to be concerned with?

As Clyde wrote above,

you should not do this. The American Statistical Association has recommended that the use of statistical significance testing be abandoned. Read https://www.tandfonline.com/doi/full...5.2019.1583913
Comment
Nikifor Naumov

Join Date: Jun 2022

Posts: 9
#14

12 Jul 2022, 09:02

The coefficient reflects market timing abilities that demonstrate the ability of investment managers to make adjustments to the asset portfolio to anticipate changes in market price movements in general. If the coefficient is positive and significant, it indicates that the investment manager has the ability to market timing. Likewise, if it is negative and significant, it indicates that the investment manager does not have the ability to market timing.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#15

12 Jul 2022, 09:31

If the coefficient is positive and significant, it indicates that the investment manager has the ability to market timing. Likewise, if it is negative and significant, it indicates that the investment manager does not have the ability to market timing.

This kind of nonsense is a good example of why the American Statistical Association recommended abandonment of the use of statistical significance testing.

That said, the practice is entrenched in the literature and it is going to be a decades-long struggle to bring about change. Meanwhile, while not condoning the overall project, it is reasonable to point out that the statistics applying to the coefficient of mktrf2 will be found in column 2 of the matrix M = r(table). I suppose I am in the business of selling rope and denying responsibility when the purchaser uses it to hang himself.
Comment

Announcement