Squared Variable Omitted in estimation of prais-winsten and cochrane-orcutt

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#16

19 Dec 2020, 04:43

I think the breezy way how Rich Goldstein and Nick Cox treat the reparametrisations did not really explain what is going on here, and seems to me that falsely suggests that what is going on here is rather trivial. I do not think what is going on is trivial, and if I personally did not have my own version of what is going on here, I would have ended up even more confused after reading Rick and Nick's explanations.

Many people in statistics and econometrics treat reparametrisations as if they are something trivial, e.g., Nick in #15 wrote that "A quadratic is a quadratic; you are just parameterising it differently.-- with a side-effect on the intercept, which lacks inherent interest any way, so far as I can imagine." Rich in #12 wrote "yes, centering changes the constant - without the centering, the constant is meaningless (assume year was your only predictor; without centering the constant is the mean when year = 0 - does anyone really care?)"

I think these statements are misleading on two accounts:

1. Reparametrisations have serious consequences--some or all of your parameters change their meaning.
2. No, the effect is not only on the intercept. If you are interested in the partial derivative with respect to year, as you presumably should be, the reparametrisation changes the meaning of both the slope on the year, and the slope on the year^2.

Therefore I will go with my version of what is happening here. As soon as we leave the realm of linear models, even in a trivial way such as including a quadratic, the marginal effects are no longer constant.

E.g., if we are estimating E(y| year) = b*year + c*year^2, then d[E(y| year)]/d[year] = b +2*c*year, so this marginal effect clearly depends at which level of year we are measuring it.

When Rich subtracted the minimum value of year in the sample, he reparametrised the model in such a way that the new parameters are easier to interpret. In particular if you evaluate the derivative d[E(y| year)]/d[year] = b +2*c*year at the minimum value of year 1935 in Rich's reparametrisation, then this derivative will evaluate to d[E(y| year)]/d[year] = b +2*c*0 = b, therefore in Rich's reparametrisation the meaning of the estimated slope on year (while disregarding the estimated slope on year^2) has the meaning of the marginal effect d[E(y| year)]/d[year] evaluated at year 1935.

And as to why you cannot estimate the original model without reparametrising it, you can see it from the following picture:

Code:

twoway function xsq = x^2, range(-1968 1968) xline(1935 1968)

What we see from this picture, is that the relationship between year and year^2 in the range of above 1935 is basically non-stochastic and linear. They are basically perfectly collinear. Notice that this is not so for the range close to 0, the relationship there is curved.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#17

19 Dec 2020, 05:03

The statement that subtracting a constant from year changes only the intercept is generally incorrect in nonlinear models, and is incorrect in the model under consideration having year and year^2. The statement is correct only in linear models. Here is an illustration:

Code:

. gen year1935 = year - 1935

. reg ftheft year, noheader
------------------------------------------------------------------------------
      ftheft |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        year |   1.070374   .2232162     4.80   0.000     .6156978    1.525051
       _cons |  -2059.706   435.6119    -4.73   0.000    -2947.019   -1172.394
------------------------------------------------------------------------------

. reg ftheft year1935, noheader
------------------------------------------------------------------------------
      ftheft |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    year1935 |   1.070374   .2232162     4.80   0.000     .6156978    1.525051
       _cons |   11.46824   4.284937     2.68   0.012     2.740104    20.19637
------------------------------------------------------------------------------

So yes, in the linear model subtracting a constant 1935 here changes only the intercept.

This is no longer true in the nonlinear model with the quadratic in year:

Code:

. reg ftheft year c.year#c.year, noheader
-------------------------------------------------------------------------------
       ftheft |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         year |   -466.228    56.1931    -8.30   0.000    -580.8346   -351.6214
              |
c.year#c.year |    .119728   .0143974     8.32   0.000     .0903644    .1490916
              |
        _cons |   453895.2   54829.45     8.28   0.000     342069.8    565720.6
-------------------------------------------------------------------------------

. reg ftheft year1935 c.year1935#c.year1935, noheader
---------------------------------------------------------------------------------------
               ftheft |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
             year1935 |   -2.88065   .4915813    -5.86   0.000    -3.883237   -1.878063
                      |
c.year1935#c.year1935 |    .119728   .0143974     8.32   0.000     .0903644    .1490916
                      |
                _cons |   32.54036   3.505304     9.28   0.000     25.39125    39.68948
---------------------------------------------------------------------------------------

Here the intercept changes, and the slope on year changed too.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35699
#18

19 Dec 2020, 05:56

I can't see that anything in #16 and #17 undermines or even, read carefully, contradicts anything said by Rich Goldstein or myself. We are just making standard points, but somehow they don't seem common in textbooks I know about. This may be a case of a point being too complicated for authors of elementary textbooks and too obvious for authors of advanced textbooks. I plead guilty, as would Rich, to writing short posts if longer posts were needed.

Sometimes long posts can be confusing too. Joro contributes his own share of confusion by calling some regressions linear and some nonlinear whereas all the regressions in this thread are as linear (in the parameters and even in the variables) as they can be. If Joro wants to call a regression in year and its square nonlinear I think that's at odds with mainstream usage.

In principle year squared and year are related by squaring -- no disagreement there, I hope -- but for values as observed in these data (conventional calendar years) the correlation is so near 1 that Stata won't allow both to be included in a model. I don't find it helpful to call this situation "basically perfectly collinear". collinear doesn't need perfectly as a qualifier and if that is an intensifier it's undone by basically. Let's forget the adverbs and just talk about correlations equal to 1 (which we don't have) and almost 1 (which we do). There could be a side discussion on whether exclusion by regress of one predictor is here oversensitive but Stata's behaviour is designed in researchers' best interests.

Once year and its square are both included as predictors the only effect of real interest is their joint effect as expressed in predicted values. The two coefficients don't have inherent distinct meanings to be interpreted. Joro's points about marginal effects and partial derivatives seem to be just the same point expressed differently.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#19

19 Dec 2020, 06:18

I generally agree with what Nick Cox write in #18; I generally do write short posts as I do not see myself as a unpaid tutor; I do note that I think that both the intercept and the linear part of the quadratic can be, and sometimes should be, interpreted - and centering will affect each; I have previously written about uses of the intercept on Statalist but can't immediately find it; if I do, I will post a link
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#20

19 Dec 2020, 12:16

Rich and Nick, you both said that the reparametrisation Rich suggested (subtracting some meaningful constant from year) affects only the intercept, and this is wrong in this nonlinear model. And I know why you are committing this error: because you are trivialising the issue of reparametrising regression models (which generally is quite complicated) and you are extrapolating things which are true for linear models to nonlinear models mechanically and without much thinking. And such extrapolations typically do not work. The linear model is quite particular in many respects and intuitions learnt on it typically do not carry over to nonlinear models.

I am not making a big deal of you being wrong. If I compile a list of occasions on which I have been wrong it will come as thick as the bible. I am making waves because you are trivialising a non-trivial issue: reparametrising regression models has consequences for interpreting the parameters, and also statistical consequences which are not always easy to predict or get on intuition gained from linear models.

As to the new topics that Nick injected: the model of Original Poster is linear in the parameters and nonlinear in the variables. So no, what Nick said that the model is linear "even in the variables" is incorrect.

With the "nearly perfectly collinear" I was trying to express the idea that Stata has some notion that something is so close to perfectly collinear, that it is dropped. Curiously enough here -regress- judges r(rho) = .9999974792343064 not to be perfectly collinear, but -prais- judges it to be perfectly collinear. This is the interesting thing. Terminology that we use is not really important.

To summarise, I am totally agreed with the course of action (centering the year variable) that you proposed. I was disagreed with you stating that the issue is trivial and carrying over to make incorrect statements about the consequences of the transformation.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#21

19 Dec 2020, 12:19

no, I did not say that; in fact, in #19 I said just the opposite
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#22

19 Dec 2020, 13:25

I didn't say it or imply it either. What I said is on record and I doubt that repeating it will help anybody.
Comment

Eric de Souza

Join Date: Mar 2014
Posts: 587

#23

22 Dec 2020, 10:06

Note that the following will also work.
This is because the numbers in the variable, time, are small

Code:

gen time = _n
tsset time
prais ftheft tfr partic degrees c.time##c.time

Prais-Winsten AR(1) regression -- iterated estimates

      Source |       SS           df       MS      Number of obs   =        34
-------------+----------------------------------   F(5, 28)        =     26.80
       Model |  1061.34611         5  212.269222   Prob > F        =    0.0000
    Residual |  221.787888        28    7.920996   R-squared       =    0.8272
-------------+----------------------------------   Adj R-squared   =    0.7963
       Total |    1283.134        33  38.8828484   Root MSE        =    2.8144

---------------------------------------------------------------------------------
         ftheft |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
            tfr |  -.0154238     .00489    -3.15   0.004    -.0254405   -.0054072
         partic |   .0338571   .0307243     1.10   0.280    -.0290787     .096793
        degrees |  -.0392367    .114543    -0.34   0.734    -.2738673     .195394
           time |   -.242029   .8313634    -0.29   0.773       -1.945    1.460942
                |
c.time#c.time   |   .0471802   .0228427     2.07   0.048     .0003891    .0939713
                |
          _cons |   56.99951   18.11466     3.15   0.004     19.89331     94.1057
----------------+----------------------------------------------------------------
            rho |   .7500145
---------------------------------------------------------------------------------
Durbin-Watson statistic (original)    1.036600
Durbin-Watson statistic (transformed) 1.571137

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#24

23 Dec 2020, 08:42

In took the inclusion of year and its square to act as controls for trends — not themselves of intrinsic interest. Therefore, one should include the terms in a way such that Stata can tell they aren’t perfectly collinear. I always do what Eric proposed, but starting at zero is, of course, perfectly fine. What we call the first time period is arbitrary, anyway.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#25

23 Dec 2020, 10:16

There is something fishy going on with -prais- here. Even when I do the most basic two step Cochrane Orcutt, it drops year squared:

Code:

. gen double year2 = year^2

. prais ftheft tfr partic degrees year year2, corc twostep
note: year2 omitted because of collinearity

Iteration 0:  rho = 0.0000
Iteration 1:  rho = 0.5056

Cochrane-Orcutt AR(1) regression -- twostep estimates

      Source |       SS           df       MS      Number of obs   =        33
-------------+----------------------------------   F(4, 28)        =     70.01
       Model |  2620.10889         4  655.027222   Prob > F        =    0.0000
    Residual |  261.983085        28  9.35653875   R-squared       =    0.9091
-------------+----------------------------------   Adj R-squared   =    0.8961
       Total |  2882.09197        32  90.0653742   Root MSE        =    3.0588

------------------------------------------------------------------------------
      ftheft |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         tfr |  -.0222059   .0045192    -4.91   0.000    -.0314631   -.0129488
      partic |   .0454232   .0316107     1.44   0.162    -.0193284    .1101748
     degrees |   .0198292   .1343606     0.15   0.884     -.255396    .2950544
        year |    1.44666   .3601646     4.02   0.000      .708896    2.184424
       year2 |          0  (omitted)
       _cons |  -2733.971   682.7058    -4.00   0.000     -4132.43   -1335.511
-------------+----------------------------------------------------------------
         rho |   .5056467
------------------------------------------------------------------------------
Durbin-Watson statistic (original)    0.972504
Durbin-Watson statistic (transformed) 1.400370

But when I implement the same procedure manually through -regress- regress does not drop anything. Here:

Code:

. qui reg ftheft tfr partic degrees year year2

. predict double e, resid

. reg e l.e, noheader
------------------------------------------------------------------------------
           e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           e |
         L1. |   .4788784   .1683823     2.84   0.008     .1354604    .8222963
             |
       _cons |  -.0512279   .5085066    -0.10   0.920    -1.088334    .9858781
------------------------------------------------------------------------------

. sca Rho = _b[l.e]

. qui for var ftheft tfr partic degrees year year2: gen double rX = X - Rho*l.X

. reg r*

      Source |       SS           df       MS      Number of obs   =        33
-------------+----------------------------------   F(5, 27)        =     65.72
       Model |  2874.35493         5  574.870987   Prob > F        =    0.0000
    Residual |  236.182606        27  8.74750392   R-squared       =    0.9241
-------------+----------------------------------   Adj R-squared   =    0.9100
       Total |  3110.53754        32  97.2042981   Root MSE        =    2.9576

------------------------------------------------------------------------------
     rftheft |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        rtfr |  -.0152054   .0058174    -2.61   0.014    -.0271417    -.003269
     rpartic |   .0556742    .030561     1.82   0.080    -.0070318    .1183801
    rdegrees |   .0395174   .1309233     0.30   0.765     -.229115    .3081497
       ryear |  -175.9907   97.08065    -1.81   0.081    -375.1838    23.20232
      ryear2 |   .0453915   .0248367     1.83   0.079    -.0055691    .0963522
       _cons |   88923.09   49428.84     1.80   0.083     -12496.5    190342.7
------------------------------------------------------------------------------

I do replicate to 2nd digit after the decimal the estimated derivative with respect to year that would be obtained by subtracting 1935:

Code:

. dis -175.9907 + 2*.0453915*1935
-.325595

. gen year1935 = year - 1935

. prais ftheft tfr partic degrees year1935 c.year1935#c.year1935, corc twostep

Iteration 0:  rho = 0.0000
Iteration 1:  rho = 0.4779

Cochrane-Orcutt AR(1) regression -- twostep estimates

      Source |       SS           df       MS      Number of obs   =        33
-------------+----------------------------------   F(5, 27)        =     65.88
       Model |  2882.71645         5  576.543289   Prob > F        =    0.0000
    Residual |  236.304507        27  8.75201877   R-squared       =    0.9242
-------------+----------------------------------   Adj R-squared   =    0.9102
       Total |  3119.02095        32  97.4694047   Root MSE        =    2.9584

---------------------------------------------------------------------------------------
               ftheft |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
                  tfr |  -.0152107   .0058167    -2.62   0.014    -.0271456   -.0032758
               partic |   .0557503   .0305565     1.82   0.079    -.0069464    .1184471
              degrees |   .0397882   .1309858     0.30   0.764    -.2289725    .3085489
             year1935 |  -.3231441   1.021618    -0.32   0.754    -2.419331    1.773043
                      |
c.year1935#c.year1935 |    .045313   .0247967     1.83   0.079    -.0055657    .0961917
                      |
                _cons |    51.9431   19.35248     2.68   0.012     12.23509    91.65111
----------------------+----------------------------------------------------------------
                  rho |   .4779064
---------------------------------------------------------------------------------------
Durbin-Watson statistic (original)    1.036600
Durbin-Watson statistic (transformed) 1.365337

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment