Are factor variable values important in regression

Leo Davis

Join Date: Sep 2021

Posts: 10
#1

Are factor variable values important in regression

07 Oct 2021, 16:20

I am using data that has negative factor variable values and no value = 0. I am removing the negative values as they are not wanted, but my remaining factor values start at 1 instead of 0. Does this make a difference to the regression?

Here is an example

Code:

g10 - Have | you | received a | Coronavirus | vaccine? | Freq. Percent Cum. ------------+----------------------------------- 1 | 125 2.13 2.13 2 | 5,732 97.87 100.00 ------------+----------------------------------- Total | 5,857 100.00

1 and 2 are labelled "yes" "no" respectively

I am also removing the negative values for the regression using the following at the end of the regression command

Code:

if w5_nc_cvhadvac >=0 & w5_nc_cvhadvac < .

Is there a way to permanently remove the negative values and yes and no have value labels 0 and 1.

Last edited by Leo Davis; 07 Oct 2021, 16:28.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#2

07 Oct 2021, 16:35

In the context of a regression (and -margins-), the actual values of the factor variables make no difference at all. They could, in principle, be anything. In fact, it has never been clear to me why Stata's factor-variable notation does not permit negative values (or, for that matter non-integer values, or even strings.) Factor variable notation is a convenient shorthand that tells Stata to create separate indicator variables ("dummies") for each value of the actual variable. The results will be the same regardless of what those values are--all that matters is which observations have the same values of the variable.

It is not clear what you mean by "removing the negative values." You can replace them with missing values:

Code:

replace w5_nc_chadvac = . if w5_nc_chadvac < 0

if you want to retain the observations information on other variables for use in other analyses. Or, if you really won't be using those observations for anything, you can just drop them altogether:

Code:

drop if w5_nc_chadvac < 0

As for giving that yesno variable (coded as 1/2) value labels 0 and 1, I think you are confusing language. At least I hope you are, because taken literally, it is a really terrible idea. What you should do to make this variable (and any others like it) most useful in Stata is actuallychange the values to 1 and 0 and, optionally, apply "yes" and "no" as value labels.

Code:

recode g10 (2 = 0) label define yesno 0 "No" 1 "Yes" label values g10 yesno
Comment
Leo Davis

Join Date: Sep 2021

Posts: 10
#3

07 Oct 2021, 16:59

Yes, I believe I was confusing my language. I thought a value label was the value (1) assigned to the label (yes) haha. Thank you for your help.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35692
#4

08 Oct 2021, 02:48

This needs a cross-reference to your previous thread https://www.statalist.org/forums/for...rical-variable in which the strong advice was to map to a variable with positive values. An exception might be if any negative code was really some kind of missing value which you might want to ignore.
Comment

Announcement

Are factor variable values important in regression

Comment

Comment

Comment