Standard errors are nonmissing when they should be missing

Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#1

Standard errors are nonmissing when they should be missing

12 May 2015, 10:54

Code:

clear input y x 1 0 0 1 1 1 0 0 end reg y x, vce(cl x) // SE of x is explicitly omitted drop if _n == 4 reg y x, vce(cl x) // SE of x is nonmissing drop if _n == 3 reg y x, vce(cl x) // SE of x is nonexistent

I don't understand the difference between the explicitly omitted SE and the nonexistent SE, but I especially don't like that nonmissing SE. I suggest that it should be omitted, because I believe it is undefined.
Tags: bug
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#2

12 May 2015, 14:54

The output depends on what elements are in the VCE matrix. If the element is 0, then the standard error is omitted. If on the other hand the element is not a real number (or not defined such as division by zero), Stata outputs -1.#IND ("IND" for "indeterminate"). In the case of non-missing, then there is a positive element in the VCE. Therefore, you should look at the algorithm that is used to construct the VCE.

Code:

. mat list V // SE Omitted symmetric V[2,2] x _cons x 0 _cons 0 0 . mat list V //non-missing x _cons x 1.233e-32 _cons 0 0 . mat list V //Nonexistent symmetric V[2,2] x _cons x -1.#IND _cons -1.#IND -1.#IND
Comment
Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#3

12 May 2015, 15:37

Unless I am mistaken (which I am astonishingly often), the magic happens in -_regress-, which is a built-in black box of compiled code. I don't know how to back out the algorithm.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10214

13 May 2015, 10:59

It is just the usual OLS algorithm. I would not consider it difficult to construct the VCE but rather time-consuming. Let us use your example and taking into account the following formulae (somehow I still recall these off the top of my head)

Code:

From the model Y= Xβ + e

1. The OLS estimate is βhat= (X'X)^-1(X'Y)
2. The variance estimate is σhat= ((Y-Xβ)' (Y-Xβ))/ (N-K)

Run your regression and retrieve the VCE matrix

Code:

input y x
1 0
0 1
1 1
0 0
end

reg y x

. reg y x

      Source |       SS       df       MS              Number of obs =       4
-------------+------------------------------           F(  1,     2) =    0.00
       Model |           0     1           0           Prob > F      =  1.0000
    Residual |           1     2          .5           R-squared     =  0.0000
-------------+------------------------------           Adj R-squared = -0.5000
       Total |           1     3  .333333333           Root MSE      =  .70711

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |          0   .7071068     0.00   1.000    -3.042435    3.042435
       _cons |         .5         .5     1.00   0.423    -1.651326    2.651326
------------------------------------------------------------------------------

Code:



. mat V= e(V)

. mat list V

symmetric V[2,2]
           x  _cons
    x     .5
_cons   -.25    .25

Now, we will reconstruct this VCE matrix in bold. Recall that the X matrix has a column of ones for the intercept, and the second column is our "x" variable.

Code:

. mat X= (1,0\ 1,1\ 1,1\ 1,0)

. mat list X

X[4,2]
    c1  c2
r1   1   0
r2   1   1
r3   1   1
r4   1   0

Take the transpose of X, multiply by X and invert (usually difficulties arise if the determinant of the matrix is 0 or close to 0, causing problems with getting an inverse (such matrices are referred to as "ill conditioned"). This is why multicollinearity is a big deal! ).

Code:

. mat XP= X'

. mat XPX= XP*X

. mat XPXI = inv(XPX)

. mat list XPXI

symmetric XPXI[2,2]
     c1   c2
c1   .5
c2  -.5    1

Now we can get the Beta matrix containing our estimates

Code:

. mat Y= (1\0\1\0)

. mat list Y

Y[4,1]
    c1
r1   1
r2   0
r3   1
r4   0


. mat XPY= XP*Y


. mat B= XPXI*XPY

. mat list B

B[2,1]
    c1
c1  .5
c2   0

The elements in matrix B (in bold) match our estimates (only reversed in order where the first element is the intercept). Now, let us compute the variance noting that N=4 (4 observations), K=2 (intercept+ 1 variable)

Code:

. mat YLXB= Y-XB

. mat YLXBP = YLXB'

. mat S= (YLXBP*YLXB)*(1/(4-2))


. mat VCE= S*XPXI

. mat list VCE

symmetric VCE[2,2]
      c1    c2
c1   .25
c2  -.25    .5

And there you have the VCE (in bold) where the diagonal element in the second row is the variance of x.

Last edited by Andrew Musau; 13 May 2015, 11:05.

Comment

Nils Enevoldsen

Join Date: Oct 2014
Posts: 296

13 May 2015, 12:04

I don't think this is the method that _regress uses, because it yields a different VCE matrix for the example that had a nonmissing SE.

Code:

mat X= (1,0\ 0,1\ 1,1)
mat XP= X'
mat XPX= XP*X
mat XPXI = inv(XPX)
mat Y= (0\1\1)
mat XPY= XP*Y
mat B= XPXI*XPY
mat XB= X*B
mat YLXB= Y-XB
mat YLXBP = YLXB'
mat S= (YLXBP*YLXB)*(1/(3-2))
mat VCE= S*XPXI
mat list VCE

Result:

Code:

symmetric VCE[2,2]
    c1  c2
c1   0
c2   0   0

Comment

Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#6

13 May 2015, 12:16

Ooh, in Mata, I can maybe get these results (?) if I use cholinv(). Invsym() seems to "fix" the problem, though.

Code:

mata X= (1,0\ 0,1\ 1,1) Y= (0\ 1\ 1) XPXI = cholinv(cross(X,X)) B= XPXI*cross(X,Y) VCE= cross(Y-X*B,Y-X*B)*(1/(3-2))*XPXI XPXI B VCE end

Last edited by Nils Enevoldsen; 13 May 2015, 12:25.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#7

13 May 2015, 13:48

The inconsistency arises in how you specify your X matrix (using the steps I outline). Recall that the first column must be a row of ones (for the intercept), and the "x" variable being the second column. So in the case of non-missing

Code:

Data | y x | |-------| 1. | 1 0 | 2. | 0 1 | 3. | 1 1 | The X matrix is | 1 0 | | 1 1 | | 1 1 |

and the code should work. Yes, what you obtain from MATA is a shortcut to implement the algorithm, and what actually Stata uses.
Comment
Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#8

13 May 2015, 18:04

Oh, I see. But I'm still unable to replicate using Stata and inv(). Can you? Maybe I'm doing it wrong.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10214

13 May 2015, 19:35

Code:

. reg y x

      Source |       SS       df       MS              Number of obs =       3
-------------+------------------------------           F(  1,     1) =    0.33
       Model |  .166666667     1  .166666667           Prob > F      =  0.6667
    Residual |          .5     1          .5           R-squared     =  0.2500
-------------+------------------------------           Adj R-squared = -0.5000
       Total |  .666666667     2  .333333333           Root MSE      =  .70711

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |        -.5   .8660254    -0.58   0.667     -11.5039     10.5039
       _cons |          1   .7071068     1.41   0.392    -7.984644    9.984644
------------------------------------------------------------------------------

. mat V= e(V)

. mat list V

symmetric V[2,2]
           x  _cons
    x    .75
_cons    -.5     .5

In Mata using Stata's cross command

Code:

mata

: X= (1,0\ 1,1\ 1,1)

: X
       1   2
    +---------+
  1 |  1   0  |
  2 |  1   1  |
  3 |  1   1  |
    +---------+

: Y= (1\0\ 1)

: Y
       1
    +-----+
  1 |  1  |
  2 |  0  |
  3 |  1  |
    +-----+

: XX = cross(X, X)

: XXI = cholinv(XX)

: B= XXI*cross(X,Y)

: B
         1
    +-------+
  1 |    1  |
  2 |  -.5  |
    +-------+

: VCE= cross(Y-X*B,Y-X*B)*(1/(3-2))*XXI

: VCE
[symmetric]
         1     2
    +-------------+
  1 |   .5        |
  2 |  -.5   .75  |
    +-------------+

end

Note that the above is the OLS variance estimator. If you want to know the formula for the robust cluster variance estimator, refer to the following link and adjust the code accordingly

http://www.stata.com/support/faqs/st...luster-option/

Comment

Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#10

14 May 2015, 10:38

Right, but the issue doesn't manifest without clustering, AFAICT. I tried to modify the code to produce the cluster variance estimator, but I lack the skill. In any case, I don't know what it would tell us.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#11

14 May 2015, 14:57

There are algorithms that are used to determine the optimal number of clusters and what observations are put into what cluster. In the two-cluster case, it may be straight forward to replicate, but once the number of clusters grows large, it is far from straight forward. The algorithms that Stata implements should be some of the most efficient (with the available technology). Otherwise, the robust (unclustered) variances are easy to calculate. As long as you understand the formula, I see no added benefit in replicating (if there was a bug in the implementation, I am sure that someone would have identified it).

Some common clustering techniques

http://www.stat.berkeley.edu/~s133/Cluster2a.html
Comment
Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#12

14 May 2015, 15:40

I disagree that the implementation is bug-free. It's clear that there is a bug. The SE of the trivial example I gave is not, in fact, 1.11e-16. A human might look at that result and realize that it's obviously a spurious result, but it's still incorrect. I usually wouldn't consider that inaccuracy to be a bug for any number other than exact zero, but 1.11e-16 and 0 are importantly different.

I have no intention of building my own estimator. I thought it might be possible to work out the nature of the error that Stata makes under the hood, but I am not up to the task.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#13

14 May 2015, 16:46

Disregarding Stata's algorithm for the moment,.can you show this numerically that it is incorrect?
Comment
Nils Enevoldsen

Join Date: Oct 2014

Posts: 296
#14

14 May 2015, 18:45

I can't, but I will eat my hat if 1.11e-16 is correct and 0 is incorrect. I'm sure some kind person can demonstrate it deftly.

If you think that 1.11e-16 is not an obviously spurious result, then that's all the more reason why it should be changed.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#15

15 May 2015, 12:31

Since you fail to prove your proposition, I will accept pro-tempore that it is correct. Thus, I would consider 1.11e-16 to be a spurious result if you can show me how a digital computer while performing arithmetic operations can precisely represent an irrational number.

Code:

*DEFINE "a" AS THE SQUARE ROOT OF 2 . scalar a= sqrt(2) . di a 1.4142136 *TAKE THE VALUE OF "a" AND SQUARE IT (THE ANSWER SHOULD BE 2) . scalar b= 1.4142136*1.4142136 . di b 2.0000001 *TAKE AWAY 2 (THE ANSWER SHOULD BE 0) . scalar c= b-2 *THE ANSWER . di c 1.064e-07
Comment

Announcement