Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • prais with rhotype(nagar) option

    The manual for -prais- says that the 'rhotype(nagar)' option computes rho by the formula (1) rho_nagar = (rho_dw*N^2 + k^2)/(N^2 - k^2). I cannot verify this. I run two Prais-Winsten regressions for the same model with the same data using rhotype(dw) and rhotype(nagar). Both are two-step estimators so rho is estimated one step using the same OLS residuals for both commands. Therefore, (1) should hold if the Manual is correct. We have

    Code:
    . webuse qsales, clear
    . prais csales isales, two rho(dw)
    . di e(rho)
    .63263622
    
    . prais csales isales, two rho(nagar)
    . di e(rho)
    .6702161
    For this data set and model, we have N = 20 and k = 2. I have checked it (1) is satisfied. It isn't:

    Code:
    . prais csales isales, two rho(dw)
    . scalar rho_dw = e(rho)
    . di (rho_dw*20^2 + 2^2)/(20^2 - 2^2)
    .64912749
    Instead, the equality holds if k is replaced with k+1 = 3

    Code:
    . di (rho_dw*20^2 + 3^2)/(20^2 - 3^2) /* note 3^2 instead of 2^2 */
    .67021608
    How can I make sense of the discrepancy between the Manual and the experiment results?

  • #2
    It depends of how you define k. In linear regression estimated using OLS, you normally refer to k as the number of slope parameters. In AR1 regression from

    Code:
    help arima
    you have

    arima stores the following in e():

    Scalars
    e(N) number of observations
    e(N_gaps) number of gaps
    e(k) number of parameters
    So you would count the constant term as well.

    Comment


    • #3
      Thanks for your reply.

      I would have thought the same way you did if it was k-1 (i.e., 1), but the problem is `rhotype(nagar)` uses 3 (which is 2 + # of slope parameters) when there is one non-constant variable (with the intercept included). I think it's either the manual or the code that is buggy.

      Comment


      • #4
        The autocorrelation coefficient (rho) is a parameter. So estimated parameters in your example are \(\widehat{\beta}_0,\; \widehat{\beta}_1\) and \(\hat{\rho}\).
        Last edited by Andrew Musau; 05 Jan 2024, 01:27.

        Comment


        • #5
          I've checked prais.ado, where k is erroneously computed as the number of all variables including the dependent variable, not just the regressors. I think it's a bug. Stata Corp. should attend it. The relevant lines of -prais.ado- are lines 111, 147-149, and 195-196.

          Comment


          • #6
            You should contact Technical Services at https://www.stata.com/support/tech-support/contact/ and inform them of your doubt. Regardless of how you interpret the command's ado file, my point in #2 remains. If you define k as the total number of parameters, then you have 3 as I show in #4. But there is no explicit definition in the prais documentation, which leaves it open to interpretation. Let's consider an example from arima where k is explicitly defined as the total number of parameters.

            Code:
            webuse friedman2, clear
            arima consump m2, ar(1) nolog
            display e(k)
            Res.:


            Code:
            . webuse friedman2, clear
            
            . arima consump m2, ar(1) nolog
            
            ARIMA regression
            
            Sample: 1959q1 thru 1998q3                      Number of obs     =        159
                                                            Wald chi2(2)      =   36787.61
            Log likelihood = -732.6621                      Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                         |                 OPG
                 consump | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
            -------------+----------------------------------------------------------------
            consump      |
                      m2 |    1.12214   .0464611    24.15   0.000     1.031078    1.213202
                   _cons |   468.3314   567.6004     0.83   0.409    -644.1449    1580.808
            -------------+----------------------------------------------------------------
            ARMA         |
                      ar |
                     L1. |   .9989295   .0053841   185.53   0.000     .9883769    1.009482
            -------------+----------------------------------------------------------------
                  /sigma |    23.7996   .9726719    24.47   0.000      21.8932      25.706
            ------------------------------------------------------------------------------
            Note: The test of the variance against zero is one sided, and the two-sided
                  confidence interval is truncated at zero.
            
            . display e(k)
            4

            Here, you have k=4, i.e., _b[consump:m2], _b[consump:_cons], _b[ARMA:L.ar] and _b[sigma:_cons].

            Comment


            • #7
              You mean rho is counted as a parameter, in which case k = 3 for simple linear regression. Thanks for the point, and sorry I misunderstood you.

              I've checked the literature carefully as now I have access to my copy of Judge et al. (1985). TLDR: The "k" is the number of columns of X, so rho is not counted in.

              Page 275, beginning of Section 8.1: "For the model y=Xbeta+e, where ... X is a (TxK) nonstochastic design metrix..."
              Page 287, Section 8.2.1b: "Theil and Nagar (1961) suggest that the estimator rho^* = [T^2 (1-d/2) + K^2] / [T^2 - K^2] will be ..."

              The definition of K didn't change between pages 275 and 287.

              The Stata Manual is OK because it is unclear. The -prais- code is buggy if Judge et al. (1985) is correct. The -prais- ado file is coded so that the dependent variable is counted in, which is the source of the problem.

              I hope that this list is attended by Stata Corp and the bug is fixed soon.

              Comment


              • #8
                Page 275, beginning of Section 8.1: "For the model y=Xbeta+e, where ... X is a (TxK) nonstochastic design metrix..."
                Page 287, Section 8.2.1b: "Theil and Nagar (1961) suggest that the estimator rho^* = [T^2 (1-d/2) + K^2] / [T^2 - K^2] will be ..."
                Agreed, this suggests that \(K\) equals 2 in your example. It is not guaranteed that StataCorp will come across this thread, so what is recommended is to contact Technical Services. I will nevertheless tag Gustavo Sanchez (StataCorp).

                Comment


                • #9
                  Thanks.

                  Comment

                  Working...
                  X