Good afternoon,
If we look at the formula for the OLS estimator b = inv(X'X)*X'Y we see that the OLS estimator is invariant to division of all the variables (dependent and all independent) by the same constant.
But when I do it, I obtain difference in the OLS estimates in the 5 digit after the decimal point, see the code below. In the original regression I obtain
. dis _b[mpg]
-56.194159
. dis _b[headroom]
-675.59623
vs
. dis _b[mpg]
-56.194129
. dis _b[headroom]
-675.59625
after I divide throughout by a constant. The estimate are close, but they are not the same.
My questions are:
1) Is this a precision issue?
2) How do I control this issue and make it disappear?
3) Should we be worried about this?
If we look at the formula for the OLS estimator b = inv(X'X)*X'Y we see that the OLS estimator is invariant to division of all the variables (dependent and all independent) by the same constant.
But when I do it, I obtain difference in the OLS estimates in the 5 digit after the decimal point, see the code below. In the original regression I obtain
. dis _b[mpg]
-56.194159
. dis _b[headroom]
-675.59623
vs
. dis _b[mpg]
-56.194129
. dis _b[headroom]
-675.59625
after I divide throughout by a constant. The estimate are close, but they are not the same.
My questions are:
1) Is this a precision issue?
2) How do I control this issue and make it disappear?
3) Should we be worried about this?
Code:
. clear
. sysuse auto
(1978 automobile data)
. gen ones = 1
. reg price mpg headroom weight ones, hascons
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(3, 70) = 11.09
Model | 204556469 3 68185489.6 Prob > F = 0.0000
Residual | 430508927 70 6150127.53 R-squared = 0.3221
-------------+---------------------------------- Adj R-squared = 0.2931
Total | 635065396 73 8699525.97 Root MSE = 2479.9
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
mpg | -56.19416 85.07654 -0.66 0.511 -225.874 113.4856
headroom | -675.5962 392.3504 -1.72 0.090 -1458.115 106.922
weight | 2.061945 .6586383 3.13 0.003 .748332 3.375557
ones | 3158.306 3617.449 0.87 0.386 -4056.468 10373.08
------------------------------------------------------------------------------
. dis _b[mpg]
-56.194159
. dis _b[headroom]
-675.59623
. predict double e, resid
. summ e
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
e | 74 -6.15e-14 2428.453 -3354.365 7108.818
. sca SD = r(sd)
. for var price mpg headroom weight ones: replace X = X/SD
-> replace price = price/SD
variable price was int now float
(74 real changes made)
-> replace mpg = mpg/SD
variable mpg was int now float
(74 real changes made)
-> replace headroom = headroom/SD
(74 real changes made)
-> replace weight = weight/SD
variable weight was int now float
(74 real changes made)
-> replace ones = ones/SD
(74 real changes made)
. reg price mpg headroom weight ones, hascons
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(3, 70) = 11.09
Model | 34.6859748 3 11.5619916 Prob > F = 0.0000
Residual | 72.9999978 70 1.04285711 R-squared = 0.3221
-------------+---------------------------------- Adj R-squared = 0.2931
Total | 107.685973 73 1.47515031 Root MSE = 1.0212
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
mpg | -56.19413 85.07654 -0.66 0.511 -225.8739 113.4857
headroom | -675.5963 392.3504 -1.72 0.090 -1458.115 106.922
weight | 2.061945 .6586383 3.13 0.003 .7483322 3.375557
ones | 3158.305 3617.449 0.87 0.386 -4056.469 10373.08
------------------------------------------------------------------------------
. dis _b[mpg]
-56.194129
. dis _b[headroom]
-675.59625

Comment