Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding Stata's post estimation test command

    I have a dataset with 25 observations. The variables are y, k and l.
    I run the following regression
    regress y k l
    I want to test that the parameters on k and l sum to 10
    I type:

    test k+l=10

    ( 1) k + l = 10

    F( 1, 22) = 107.68
    Prob > F = 0.0000

    .
    My issue is that when I create the restricted regression and calculate the F-statistic using the usual formula ((R2unrestricted-R2restricted)/q)/((1-R2unrestricted)/(n-k)) I get a much smaller number...still rejecting the null hypothesis

    my restricted model is y-10L=B0+B1(k-l)+E
    . gen Y_10L = y-10*l

    . gen k_l=k-l

    . regress Y_10L k_l

    and I don't get a number that is equal to 107, I get around 47 using the above formula.

    I am just curious about what is going on. Is it a sample size issue?



    My data is here:


    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(y k l)
    68.24103 9.313 45.0961
    69.15226 10.6264 43.9693
    70.05865 11.5423 41.8166
    79.74423 11.9624 44.4985
    89.43562 12.2972 48.7602
    93.72334 13.045 51.1402
    102.4281 13.6777 54.4577
    95.21549 14.2198 51.2944
    108.347 14.7225 54.0984
    108.1351 15.1736 55.7854
    107.1986 16.0311 55.9122
    100.4691 16.8214 52.6973
    109.6668 16.9557 56.4288
    115.2529 16.9042 56.9827
    116.6837 17.1108 56.0163
    129.3971 17.2227 58.5997
    132.359 17.4505 59.6128
    147.1149 17.8079 61.1658
    159.5804 18.4595 64.6947
    173.8529 19.6165 69.2726
    175.291 21.2163 70.161
    184.5142 22.4894 72.3024
    196.5472 23.5281 74.2756
    183.8358 24.7325 71.2039
    177.0066 25.6062 68.9305
    end
    [/CODE]

  • #2
    Perhaps the difference is because the test command performs a Wald test, while the F-statistic you are computing appears to be a likelihood ratio test statistic. (I'm not familiar with "the usual formula" you give, but that's what it looks like to me.)

    Comment


    • #3
      I agree with William Lisowski, you do not have nested models to perform a likelihood ratio test.

      test k+l=10
      This is just a linear combination of coefficients and a constant, that is k+l-10. Using lincom, you can get the same result as test, albeit with a t-statistic instead of F. Noting \(\text{t}=\sqrt{\text{F}}\)

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(y k l)
      68.24103   9.313 45.0961
      69.15226 10.6264 43.9693
      70.05865 11.5423 41.8166
      79.74423 11.9624 44.4985
      89.43562 12.2972 48.7602
      93.72334  13.045 51.1402
      102.4281 13.6777 54.4577
      95.21549 14.2198 51.2944
       108.347 14.7225 54.0984
      108.1351 15.1736 55.7854
      107.1986 16.0311 55.9122
      100.4691 16.8214 52.6973
      109.6668 16.9557 56.4288
      115.2529 16.9042 56.9827
      116.6837 17.1108 56.0163
      129.3971 17.2227 58.5997
       132.359 17.4505 59.6128
      147.1149 17.8079 61.1658
      159.5804 18.4595 64.6947
      173.8529 19.6165 69.2726
       175.291 21.2163  70.161
      184.5142 22.4894 72.3024
      196.5472 23.5281 74.2756
      183.8358 24.7325 71.2039
      177.0066 25.6062 68.9305
      end
      
      regress y k l
      test  k+l=10
      lincom k+l-10
      di (-10.38)^2
      Res.:

      Code:
      
      .
      . test  k+l=10
      
       ( 1)  k + l = 10
      
             F(  1,    22) =  107.68
                  Prob > F =    0.0000
      
      .
      . lincom k+l-10
      
       ( 1)  k + l = 10
      
      ------------------------------------------------------------------------------
                 y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               (1) |  -5.156307   .4968966   -10.38   0.000    -6.186807   -4.125806
      ------------------------------------------------------------------------------
      
      .
      . di (-10.38)^2
      107.7444
      Now the question would be how does lincom compute the t-statistic. As t=coefficient/standard error, the coefficient is simply the sum of the coefficients

      Code:
      di _b[k]+_b[l]-10
      Res.:

      Code:
      . di _b[k]+_b[l]-10
      -5.1563067
      The standard error is obtained using the usual formula [Var(a+ b)= Var(a)+ Var(b)+ 2CoVar(ab)]. The variance of the constant term is 0, so it drops out of the equation.

      Code:
      mat l e(V)
      di sqrt((e(V)[1,1]) + (e(V)[2,2]) + (2*e(V)[1,2]))
      Res.:

      Code:
      symmetric e(V)[3,3]
                      k           l       _cons
          k   .72222981
          l  -.31387297    .1524224
      _cons   5.9776296  -3.5198448   103.93004
      
      .
      . di sqrt((e(V)[1,1]) + (e(V)[2,2]) + (2*e(V)[1,2]))
      .49689665
      Last edited by Andrew Musau; 24 Sep 2020, 10:09.

      Comment


      • #4
        The way how you impose the constraint is correct.

        You should not be using the R-squared version of the test when your dependent variable changes under the new reparametrisation. However I am also not getting the same result when I use

        F = (RSSr - RSSu)/(RSSu/(n-k))

        Here is what happens when I use the Residual Sum of Squares version of the test:

        Code:
        . gen kl = k - l
        
        . gen yy = y - 10*l
        
        . qui reg y k l
        
        . test k+l=10
        
         ( 1)  k + l = 10
        
               F(  1,    22) =  107.68
                    Prob > F =    0.0000
        
        . sca SSRu = e(rss)
        
        . sca DF = e(df_r)
        
        . qui reg yy kl
        
        . dis (e(rss)-SSRu )/(SSRu/e(df_r))
        112.57722
        I do not know what is going on here. I know where your problem is (you should not use the R-squared version of the test), but I cannot see where my problem is.

        Comment


        • #5
          @Joro Kolev: your degrees for freedom in the last line is taken from the restricted regression. It should come from the unrestricted regression which you saved as DF
          On edit:
          . display (4463.4569 - 757.20324)/(757.20324 / 22)
          107.68256
          Last edited by Eric de Souza; 24 Sep 2020, 11:08.

          Comment


          • #6
            You are right, Eric. Of course this is why I am saving the scalar DF after the unrestricted regression on command line 6. So when I have saved it, I should also use it after that, it does not help if I only save it and keep it :-).

            So everything is fine when we set up the formula correctly with RSS:

            Code:
            . dis (e(rss)-SSRu )/(SSRu/DF)
            107.68256
            the two tests are numerically equivalent.

            Originally posted by Eric de Souza View Post
            @Joro Kolev: your degrees for freedom in the last line is taken from the restricted regression. It should come from the unrestricted regression which you saved as DF

            Comment


            • #7
              Thanks for all the help. Much appreciated.

              Comment

              Working...
              X