under the -reg- hood

Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#1

under the -reg- hood

21 Feb 2021, 15:09

Apologies if this is a basic question, but can someone explain what -reg- is doing? I had been assuming that the estimates that -reg- reports are simply the result of matrix multiplication, but that seems not to be the case.

Under what conditions do the estimates reported by -reg- match matrix multiplication? I realize this is an odd hand-picked example, but I'm just trying to understand what -reg- does.

Code:

* make a data setclear set seed 999 set obs 400 gen y = rnormal() gen x1 = ln(_n + 1000) gen x2 = x1^2 gen x3 = x1^3 noi corr * /* yes, all RHS vars are very highly correlated as we'd expect*/ * matrix mult method of getting betas gen con = 1 mkmat con x1 x2 x3 , matrix(x) mkmat y matrix B = inv(x'*x)*(x'*y) matrix list B /* compare this ... */ * ols reg y x1 x2 x3 /* ... with this */
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#2

21 Feb 2021, 15:45

gen x1 = ln(_n + 1000)
gen x2 = x1^2
gen x3 = x1^3

OLS inverts a matrix, so having highly collinear variables is problematic when using a homemade device. regress handles the collinearity in some way to allow you to have quadratic, cubic and higher order terms. Some other estimators, e.g., glm will drop a variable to break the collinearity.

Under what conditions do the estimates reported by -reg- match matrix multiplication?

If the matrix has full column rank (or the columns are linearly independent).

Last edited by Andrew Musau; 21 Feb 2021, 15:50.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#3

21 Feb 2021, 15:53

It is the very high correlations among the x variables that is causing the discrepancy. Consequently, the X'X matrix is almost singular. There are different ways to invert a matrix, and when the matrix is nearly singular, these different ways can produce different results due to both computational rounding/truncation issues, and the actual numerical instability of some matrix inversion algorithms in this situation. Internal to the -reg- command, Stata sends these matrices to Mata and uses Mata's matrix inversion method, which is of higher precision than the Stata inv() matrix inversion method.

So yes, under the hood, Stata is using matrix inversion, but not the same matrix inversion algorithm you used.
2 likes
Comment
daniel klein

Join Date: Mar 2014

Posts: 3872
#4

22 Feb 2021, 01:20

Andrew and Clyde have pointed out that there are different algorithms to invert a matrix. There are also different ways to get the textbook X'X. Mata has a quadcross() function that does this in quad (i.e., double-double) precision. Also, regress (probably) applies these calculations to the de-meaned data. You can read more about that (and plenty of other interesting stuff) in Bill Gould's The Mata Book. The supplementary material has a class-based Mata implementation of linear regression (linreg2.mata) that is probably close to how regress works internally. I do not think that regress technically uses Mata. The underlying _regress is listed as built-in; the code is probably implemented in C.

Last edited by daniel klein; 22 Feb 2021, 01:40.
2 likes
Comment
Brian Holtemeyer

Join Date: Jun 2018

Posts: 36
#5

22 Feb 2021, 08:32

Ah, that makes sense: -reg- uses Mata which is of higher precision than the Stata inv() matrix inversion method.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#6

22 Feb 2021, 12:34

I'm a little confused by this question and this discussion; notwithstanding the "methods and formulas" section of the Stata manual, I don't believe that Stata actually inverts the matrix; some version of the "sweep" algorithm is much more likely; if you want to read about regression algorthms, here are some citations (the third is the least theoretical and most practice-oriented):

Kennedy, WJ, Jr. and Gentle, JE (1980), Statistical Computing, Marcel Dekker
Thisted, RA (1988), Elements of Statistical Computing, Chapman and Hall
Press, WH, et al. (2007), Numerical Recipes, third edition, Cambridge U press
1 like
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10254

22 Feb 2021, 13:20

I do not know what regress does as StataCorp has not made the source code public, so that is why I chose to be vague in #2.

regress handles the collinearity in some way to allow you to have quadratic, cubic and higher order terms.

However, I will tend to agree with you Rich that the divergence in estimates is likely not due to the precision of the inversion method. Using David Drukker's implementation of the OLS algorithm from this Stata blog and the example in #1, you see that the algorithm will drop one of the variables to break the collinearity. I would be interested if anyone has a Mata algorithm that uses inversion and produces the exact same results as regress.

Code:

* make a data setclear
set seed 999
set obs 400
gen y     = rnormal()
gen x1     = ln(_n + 1000)
gen x2     = x1^2
gen x3     = x1^3

mata
y    = st_data(., "y")
X    = st_data(., "x1 x2 x3")
n    = rows(X)
X    = X,J(n,1,1)
XpX  = quadcross(X, X)
XpXi = invsym(XpX)
b    = XpXi*quadcross(X, y)
b
end

regress y x1 x2 x3

Res.:

Code:

: b    = XpXi*quadcross(X, y)

: b
                  1
    +----------------+
  1 |   25.37120317  |
  2 |  -7.012076157  |
  3 |   .4843253213  |
  4 |             0  |
    +----------------+

: end
--------------------------------------------------------------------------------------

.
. regress y x1 x2 x3

      Source |       SS           df       MS      Number of obs   =       400
-------------+----------------------------------   F(3, 396)       =      1.44
       Model |  4.67519272         3  1.55839757   Prob > F        =    0.2315
    Residual |  429.449514       396  1.08446847   R-squared       =    0.0108
-------------+----------------------------------   Adj R-squared   =    0.0033
       Total |  434.124707       399  1.08803185   Root MSE        =    1.0414

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |  -1298.321   10929.67    -0.12   0.906    -22785.75     20189.1
          x2 |    179.992   1544.121     0.12   0.907    -2855.707    3215.691
          x3 |  -8.320985   72.70972    -0.11   0.909    -151.2663    134.6243
       _cons |   3122.863   25785.13     0.12   0.904    -47569.99    53815.72
------------------------------------------------------------------------------

Last edited by Andrew Musau; 22 Feb 2021, 13:31.

Announcement

under the -reg- hood

Comment

Comment

Comment

Comment

Comment

Comment