Why constant is not zero when standardizing both Y and X in regression? And other inconsistencies.

Andrea Arancio

Join Date: Jan 2015

Posts: 56
#1

Why constant is not zero when standardizing both Y and X in regression? And other inconsistencies.

11 Nov 2015, 06:04

I have a multiple regression and before running it I standardize my dependent variable Y and my predictors X1, X2, X3.
I use the commands:

Code:

egen zY = std(Y) egen zX1 = std(X1) egen zX2 = std(X2) egen zX3 = std(X3)

Now if I run the multiple regression 3 questions arise:

First, why if I run the command below (reg on standardized dependent and predictors) I get a constant that is different from zero, 0.8 in my case?

Code:

reg zY zX1 zX2 zX3

Second, why if run the command below I get betas that are different from the coefficients in the regression from the command above?

Code:

reg Y X1 X2 X3, beta

Third, why if I run command A instead of commands B I get slightly different coefficients, comparing the first regression model with bStdX from listcoef?

Code:

Command A reg Y zX1 zX2 zX3 Command B reg Y X1 X2 X3 listcoef

I'm using Stata 13.
Thanks a lot!
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3464
#2

11 Nov 2015, 06:36

I guess you have missing values for some of your variables, so you standardize X1 for everyone that is observed in X1, but not all of these are actually used in the regression. Instead you need to standardize in the sample that will be used in the model.

Code:

// open example data sysuse auto, clear gen byte touse = !missing(mpg, price, rep78) egen double zmpg = std(mpg) if touse egen double zprice = std(price) if touse egen double zrep78 = std(rep78) if touse reg zprice zmpg zrep78 reg price mpg rep78, beta

(For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq )

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3872
#3

11 Nov 2015, 06:44

All of the reported issues would have benefited from a reproducible example or at least the output you got.

I guess all arise because of differences in precision and/or differences in samples being used. egen, by default, uses all observations for a given variable, while regress will, by default, be based on the sub-sample with no missing values on any of the included variables in the model. Try

Code:

// mark the sample quitely regress Y X1 X2 X3 generate byte mysample = e(sample) // standardize variables with double precision foreach x of X1 X2 X3 { quietly summarize `x' if mysample generate double z`x' = (`x' - r(mean))/r(sd) } // run the regression models regress zY zX1 zX2 zX3 regress Y X1 X2 X3 , beta regress zY X1 X2 X3 listcoef

Also note that listcoef is user-written, and you are asked to explain where it comes from.

Best
Daniel
Comment
Andrea Arancio

Join Date: Jan 2015

Posts: 56
#4

11 Nov 2015, 06:49

Great Thanks! Removing missing data solved all the issues.
Comment
Andrea Arancio

Join Date: Jan 2015

Posts: 56
#5

11 Nov 2015, 06:52

By the way, do you think is better reporting bStdX or bStdXY?
Comment
daniel klein

Join Date: Mar 2014

Posts: 3872
#6

11 Nov 2015, 06:56

Originally posted by Andrea Arancio View Post

By the way, do you think is better reporting bStdX or bStdXY?

Better in what sense?

Best
Daniel
Comment
Andrea Arancio

Join Date: Jan 2015

Posts: 56
#7

11 Nov 2015, 07:01

With the aim to see the relative imprortance of predictors.
I know this is not 100% correct though.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5011
#8

11 Nov 2015, 07:09

While I am not crazy about standardized variables in general, x standardization alone is usually sufficient for assessing the relative importance of predictors.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3872
#9

11 Nov 2015, 07:21

I would state that neither is better in that sense, as relative importance cannot be judged from statistical analysis in a meaningful way without substantial context.

First, I think that people have limited understanding of what a standard deviation means. Perhaps this is why so many want to report them. Take me as an example. I cannot tell you what a one standard deviation in age means substantially. Only after you tell me that in your data a one standard deviation of age corresponds to 10 years, I can make sense of that. The point is, I would try reporting scales that have an interpretation as natural/intuitive and easy as possible. A standard deviation does not qualify, in my view.

Secondly, suppose you are a political decision maker and want to reduce crime rates. Suppose a political researcher tells you that per prison built [substitute "one SD of prisons built" here, to make interpretation even more complicated], you reduce crime rates by factor of 5 [again, substitute "by 5 SDs", to make interpretation even more complicated]. The researcher then tells you that having a cop walking up and down the streets every night [substistute "a one SD of cops", ... I think you get the point], reduces crime rates by a factor of 2.5. What do you make of this as a politician? Should you built a new prison or should you have cops walk up and down the street? Aside from the fact that the causal mechanism underlying the former association is questionable, the answer might very well depend on how much the cost for each intervention would be. If the cost for one prison is more than 100 times than those for a cop on the streets, then you might want to invest your money in cops, despite the smaller (standardized) coefficient reported by the researcher.

Best
Daniel

Last edited by daniel klein; 11 Nov 2015, 07:41. Reason: typos
2 likes
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5011
#10

11 Nov 2015, 07:50

Here is my own handout on the evils of standardization. Just skip to the last page if you don't want to wade through the examples and math.

http://www3.nd.edu/~rwilliam/stats2/l71.pdf

You can also see this for a brief discussion of standardized coefficients in logistic regression.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment

Announcement

Why constant is not zero when standardizing both Y and X in regression? And other inconsistencies.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment