Loop, one-tailed p value, two-tailed p value

Mike Kraft

Join Date: Dec 2014
Posts: 328

Loop, one-tailed p value, two-tailed p value

19 Apr 2019, 00:47

Hi
I am running regressions by industry and year groups where the number of firms in the industry and year group is 25 firms.
I want to calculate the coefficients, t statistic and one-sided p values for all coefficients except for the intercept.
The coefficients on the variables atoW and lag_accW are predicted by to negative whereas the coefficents on all other variables are predicted to be positive.
I did the following (note that I was not ssure what to do with the negative coefficients' variables and hence my code for the p-value for these did not change):

HTML Code:

forval j = 0/6 {
   gen b`j'=.
   gen t_stat`j'=.
   gen p_value`j'=.
}
gen adjr2=.
gen unce=.

levelsof sic_2, local(levels)

foreach x of local levels {
 foreach z of numlist 1990/2016 {
 
capture reg ceW lag_ceW atoW lag_accW accW dsaleW ndsaleW  if  N_firms_sic_2_yr>25 & sic_2==`x' & yr==`z'
 
if _rc == 0  {
predict residual, res
replace unce=residual if e(sample)
drop residual
 
 replace b0 = _b[_cons] if e(sample)
 replace t_stat0 = b0/_se[_cons] if e(sample)
 replace p_value0= 2*ttail(e(df_r),abs(t_stat0)) if e(sample)
 
 
 replace b1 = _b[lag_ceW] if e(sample)
 replace t_stat1 = b1/_se[lag_ceW] if e(sample)
 replace p_value1= ttail(e(df_r),abs(t_stat1)) if e(sample)
 
 replace b2 = _b[atoW] if e(sample)
 replace t_stat2 = b2/_se[atoW] if e(sample)
 replace p_value2= ttail(e(df_r),abs(t_stat2)) if e(sample)
 
 replace b3 = _b[lag_accW] if e(sample)
 replace t_stat3 = b3/_se[lag_accW] if e(sample)
 replace p_value3= ttail(e(df_r),abs(t_stat3)) if e(sample)
 
 replace b4 = _b[accW] if e(sample)
 replace t_stat4 = b4/_se[accW] if e(sample)
 replace p_value4= ttail(e(df_r),abs(t_stat4)) if e(sample)
 
 replace b5 = _b[dsaleW] if e(sample)
 replace t_stat5 = b5/_se[dsaleW] if e(sample)
 replace p_value5= ttail(e(df_r),abs(t_stat5)) if e(sample)
 
 replace b6 = _b[ndsaleW] if e(sample)
 replace t_stat6 = b6/_se[ndsaleW] if e(sample)
 replace p_value6= ttail(e(df_r),abs(t_stat6)) if e(sample)
 
 replace adjr2=e(r2_a) if e(sample)

 }
 }
 }

The code runs and produced sensible results regarding all coefficients. However, I am not sure about the t stats or p values? Is my code for the t stats and p values correct? Do I have to use e(sample)? Note that the sample will be different from each industry-year regression? is using t stats correct here as well?

Thanks

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

19 Apr 2019, 08:29

Do I have to use e(sample)?

By that, I assume you mean to ask if, in the following code for example,

Code:

replace b0 = _b[_cons] if e(sample) replace t_stat0 = b0/_se[_cons] if e(sample) replace p_value0= 2*ttail(e(df_r),abs(t_stat0)) if e(sample)

the three occurrences of "if e(sample)" are necessary?

The answer is yes, they are required. If you omit them, the results of each regression will overwrite the values of b0, t_stat0, and p_value0 in every observation, not just the observations that were used in the regression, and ultimately all observations will contain the values from the final regression - the last value of sic_2 in the year 2016. That is certainly not what you want. For each observation, you want those three variables to reflect the results of the single regression in which that observation was included.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#3

19 Apr 2019, 08:47

Thanks for answering my question regarding e(sample).

Can anyone also assist with regard to the calculation of the one tailed and two tailed p values? Note that some variables are predicted to have negative coefficients as per my original question in post #1 .
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

19 Apr 2019, 09:21

I do not believe your calculation of one-sided t-test pvalues is correct.

I believe that for coefficients you expect to be positive the correct calculation is

Code:

replace p_value1= ttail(e(df_r),t_stat1)) if e(sample)

I believe that for coefficients you expect to be negative the correct calculation is

Code:

replace p_value2= ttail(e(df_r),-t_stat2) if e(sample)

However, I would welcome it if someone else would confirm my understanding.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#5

19 Apr 2019, 10:31

William is right.

Potentially helpful:
Stata | FAQ: One-sided tests for coefficients
(The calculation in this FAQ is done slightly different but the result is the same.)

https://www.kripfganz.de/stata/
1 like
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#6

19 Apr 2019, 15:37

For the coefficients expected to be negative you changed

Code:

abs(t_stat2)

To

Code:

- t-stat2

why is that different? If t-stats2 is expected to ne negative, shouldn't it ne the same then?

also in the FAQ Sebastian referred to, I can see 1- , should I also do that?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

19 Apr 2019, 19:39

Although you expect t_stat2 to be negative, this does not mean that it will be negative. If t_stat2 were computed to be highly positive, using

Code:

abs(t_stat2)

will incorrectly yield a small p_value2 and you will incorrectly conclude that the coefficient is significantly negative. Whereas using

Code:

-t_stat2

ensures that a positive t_stat2 will yield a large p_value2.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#8

20 Apr 2019, 05:20

OK. Thanks William.
Given that the P value is dependent on my t-stat, if I cluster the standard error by industry, do I need to change anything in my code other than adding cluster (sic_2) as following (note that sic_2 is my industry) :

Code:

capture reg ceW lag_ceW atoW lag_accW accW dsaleW ndsaleW if N_firms_sic_2_yr>25 & sic_2==`x' & yr==`z', cluster(sic_2)

in other words, would the part of the code for t-stats or p-values that are dependent on standard errors be also calculated differently if I cluster standard errors?
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#9

20 Apr 2019, 08:37

Sorry meant clustering by firm, and so replacing cluster(sic_2) by cluster(permno) where permno is the firm identifier.

look forward to hearing from you all.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

20 Apr 2019, 09:08

It appears that no adjustment to the code is necessary.

The values of _b[] and _se[] will be those reported in the output of regress, and the value of e(df_r) will be 1 less than the number of clusters, as used in the regress output for the denominator degrees of freedom. Using these values in your formulas reproduce the calculated t statistics and p-values reported in the output of regress.
Comment

Announcement

Loop, one-tailed p value, two-tailed p value

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment