Centering an Indicator Variable

Risha Gidwani-Marszowski

Join Date: Aug 2018

Posts: 11
#1

Centering an Indicator Variable

06 May 2019, 17:39

Hello, I am looking to center a covariate, per the reference below. This covariate is Hospital Referral Region (HRR), which has 306 values and enters my model as 305 indicator variables. I also use this variable in interaction terms in the model.

Per Kraemer and Blaysey (and Cronbach's) recommendations, the indicator variables should be recoded from 1 and 0 to 1-1/m and -1/m, with one (arbitrary) HRR eliminated. I'd welcome any suggestions on how to easily code this in Stata.

Kraemer HC & Blasey CM. Centring in regression analyses: a strategy to prevent errors in statistical inference. International Journal of Methods in Psychiatric Research, 13: 3.
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

07 May 2019, 11:47

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also remember we are not from your area.

In my literature, indicators are almost always coded 0/1 (which Stata does automatically) so I don't understand why you want to change this. It also seems odd to me to have 305 dummies and then interact them with other variables - you'll have many hundreds of parameters. I'm also not sure what m is. I suppose you can set up a loop from 2 to 306:

forvalues j=2/100 {
g ind`j'=1 if i==`j'
su ind`j'
replace ind`j'=-1/r(sum) if i!=`j'
}

Without your data, I can't be sure this works.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#3

07 May 2019, 11:49

I have never seen this advice before. I am pretty clear that thousands of researchers ignore it and aren't bitten by ignoring it. There are lots of really good reasons why 0 and 1 codes are congenial and practical.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#4

07 May 2019, 14:18

thank you for the citation; I don't always agree with Helena Kraemer but I always learn something from her

whether this is easy or hard depends on the data set up (e.g., are the 306 variables contiguous? if no, do they share a common piece of their name (e.g., hrr1-hrr206)? here I assume they are contiguous no matter how named (though I assume hrr1-hrr206):

Code:

foreach var of varlist hrr1-hrr306 { replace `var'=1-(1/306) if `var'==1 replace `var'=-1/306 if `var'==0 }

I recommend saving this as a new dataset to make it easy to undo

note that shorter solutions are possible

I'm not sure I agree with the authors that this is in general a good idea; in particular, I doubt that having 305 indicator variables in one model is a good idea (but I don't know what your project is about either), especially given the interactions you refer to
1 like
Comment
Risha Gidwani-Marszowski

Join Date: Aug 2018

Posts: 11
#5

07 May 2019, 20:46

Thanks very much for the sample code. I've thought through the need for the indicator variables quite a bit (this is actually linked in another post I had on StataList entitled,Three Level Models with FE, if folks would like the background). The need to recode is so that I can have an interpretation of "mean" geographic region rather than a "reference" geographic region that I would get from 0/1 coded variables.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3464
#6

08 May 2019, 01:27

That sounds to me like the "mean" you are comparing with assumes that each Hospital Reference Region is of equal size. That is probably not true. You can include your categorical variable as a regular factor variable and do the interaction with the regular factor variable, and after you can use contrast with the gw. prefix for your categorical variable to get a more reasonable mean with which to compare.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Risha Gidwani-Marszowski

Join Date: Aug 2018

Posts: 11
#7

08 May 2019, 16:03

Hi Maarten, I'm not looking to evaluate one HRR versus a mean HRR -- the HRR is really a nuisance parameter I have to adjust for and for which I need to produce predicted costs for a mean HRR. In that case I'm not sure the - contrast- command gets me what I need.

Not sure why you infer that the the mean approach I denote above assumes HRR of equal size -- each HRR is going to contribute a different number of observations to the model, so Stata will already be taking that into account. Can you let me know more about what you think is the problem?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3464
#8

09 May 2019, 01:35

You compare to the mean assuming each HRR is of equal size, because that is how you created that variable.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

Centering an Indicator Variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment