Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop, one-tailed p value, two-tailed p value

    Hi
    I am running regressions by industry and year groups where the number of firms in the industry and year group is 25 firms.
    I want to calculate the coefficients, t statistic and one-sided p values for all coefficients except for the intercept.
    The coefficients on the variables atoW and lag_accW are predicted by to negative whereas the coefficents on all other variables are predicted to be positive.
    I did the following (note that I was not ssure what to do with the negative coefficients' variables and hence my code for the p-value for these did not change):


    HTML Code:
    forval j = 0/6 {
       gen b`j'=.
       gen t_stat`j'=.
       gen p_value`j'=.
    }
    gen adjr2=.
    gen unce=.
    
    levelsof sic_2, local(levels)
    
    foreach x of local levels {
     foreach z of numlist 1990/2016 {
     
    capture reg ceW lag_ceW atoW lag_accW accW dsaleW ndsaleW  if  N_firms_sic_2_yr>25 & sic_2==`x' & yr==`z'
     
    if _rc == 0  {
    predict residual, res
    replace unce=residual if e(sample)
    drop residual
     
     replace b0 = _b[_cons] if e(sample)
     replace t_stat0 = b0/_se[_cons] if e(sample)
     replace p_value0= 2*ttail(e(df_r),abs(t_stat0)) if e(sample)
     
     
     replace b1 = _b[lag_ceW] if e(sample)
     replace t_stat1 = b1/_se[lag_ceW] if e(sample)
     replace p_value1= ttail(e(df_r),abs(t_stat1)) if e(sample)
     
     replace b2 = _b[atoW] if e(sample)
     replace t_stat2 = b2/_se[atoW] if e(sample)
     replace p_value2= ttail(e(df_r),abs(t_stat2)) if e(sample)
     
     replace b3 = _b[lag_accW] if e(sample)
     replace t_stat3 = b3/_se[lag_accW] if e(sample)
     replace p_value3= ttail(e(df_r),abs(t_stat3)) if e(sample)
     
     replace b4 = _b[accW] if e(sample)
     replace t_stat4 = b4/_se[accW] if e(sample)
     replace p_value4= ttail(e(df_r),abs(t_stat4)) if e(sample)
     
     replace b5 = _b[dsaleW] if e(sample)
     replace t_stat5 = b5/_se[dsaleW] if e(sample)
     replace p_value5= ttail(e(df_r),abs(t_stat5)) if e(sample)
     
     replace b6 = _b[ndsaleW] if e(sample)
     replace t_stat6 = b6/_se[ndsaleW] if e(sample)
     replace p_value6= ttail(e(df_r),abs(t_stat6)) if e(sample)
     
     replace adjr2=e(r2_a) if e(sample)
    
     }
     }
     }
    The code runs and produced sensible results regarding all coefficients. However, I am not sure about the t stats or p values? Is my code for the t stats and p values correct? Do I have to use e(sample)? Note that the sample will be different from each industry-year regression? is using t stats correct here as well?

    Thanks

  • #2
    Do I have to use e(sample)?
    By that, I assume you mean to ask if, in the following code for example,
    Code:
     replace b0 = _b[_cons] if e(sample)
     replace t_stat0 = b0/_se[_cons] if e(sample)
     replace p_value0= 2*ttail(e(df_r),abs(t_stat0)) if e(sample)
    the three occurrences of "if e(sample)" are necessary?

    The answer is yes, they are required. If you omit them, the results of each regression will overwrite the values of b0, t_stat0, and p_value0 in every observation, not just the observations that were used in the regression, and ultimately all observations will contain the values from the final regression - the last value of sic_2 in the year 2016. That is certainly not what you want. For each observation, you want those three variables to reflect the results of the single regression in which that observation was included.

    Comment


    • #3
      Thanks for answering my question regarding e(sample).

      Can anyone also assist with regard to the calculation of the one tailed and two tailed p values? Note that some variables are predicted to have negative coefficients as per my original question in post #1 .

      Comment


      • #4
        I do not believe your calculation of one-sided t-test pvalues is correct.

        I believe that for coefficients you expect to be positive the correct calculation is
        Code:
        replace p_value1= ttail(e(df_r),t_stat1)) if e(sample)
        I believe that for coefficients you expect to be negative the correct calculation is
        Code:
        replace p_value2= ttail(e(df_r),-t_stat2) if e(sample)
        However, I would welcome it if someone else would confirm my understanding.

        Comment


        • #5
          William is right.

          Potentially helpful:
          Stata | FAQ: One-sided tests for coefficients
          (The calculation in this FAQ is done slightly different but the result is the same.)
          https://twitter.com/Kripfganz

          Comment


          • #6
            For the coefficients expected to be negative you changed
            Code:
             ​abs(t_stat2)  ​​​​​​
            To
            Code:
            - t-stat2
            ​​​​​​
            why is that different? If t-stats2 is expected to ne negative, shouldn't it ne the same then?

            also in the FAQ Sebastian referred to, I can see 1- , should I also do that?

            Comment


            • #7
              Although you expect t_stat2 to be negative, this does not mean that it will be negative. If t_stat2 were computed to be highly positive, using
              Code:
              abs(t_stat2)
              will incorrectly yield a small p_value2 and you will incorrectly conclude that the coefficient is significantly negative. Whereas using
              Code:
              -t_stat2
              ensures that a positive t_stat2 will yield a large p_value2.

              Comment


              • #8
                OK. Thanks William.
                Given that the P value is dependent on my t-stat, if I cluster the standard error by industry, do I need to change anything in my code other than adding cluster (sic_2) as following (note that sic_2 is my industry) :
                Code:
                 
                 capture reg ceW lag_ceW atoW lag_accW accW dsaleW ndsaleW  if  N_firms_sic_2_yr>25 & sic_2==`x' & yr==`z', cluster(sic_2)
                in other words, would the part of the code for t-stats or p-values that are dependent on standard errors be also calculated differently if I cluster standard errors?

                Comment


                • #9
                  Sorry meant clustering by firm, and so replacing cluster(sic_2) by cluster(permno) where permno is the firm identifier.

                  look forward to hearing from you all.

                  Comment


                  • #10
                    It appears that no adjustment to the code is necessary.

                    The values of _b[] and _se[] will be those reported in the output of regress, and the value of e(df_r) will be 1 less than the number of clusters, as used in the regress output for the denominator degrees of freedom. Using these values in your formulas reproduce the calculated t statistics and p-values reported in the output of regress.

                    Comment

                    Working...
                    X