Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Are factor variable values important in regression

    I am using data that has negative factor variable values and no value = 0. I am removing the negative values as they are not wanted, but my remaining factor values start at 1 instead of 0. Does this make a difference to the regression?

    Here is an example
    Code:
     g10 - Have |
            you |
     received a |
    Coronavirus |
       vaccine? |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |        125        2.13        2.13
              2 |      5,732       97.87      100.00
    ------------+-----------------------------------
          Total |      5,857      100.00
    1 and 2 are labelled "yes" "no" respectively

    I am also removing the negative values for the regression using the following at the end of the regression command
    Code:
     if w5_nc_cvhadvac >=0  & w5_nc_cvhadvac < .
    Is there a way to permanently remove the negative values and yes and no have value labels 0 and 1.
    Last edited by Leo Davis; 07 Oct 2021, 16:28.

  • #2
    In the context of a regression (and -margins-), the actual values of the factor variables make no difference at all. They could, in principle, be anything. In fact, it has never been clear to me why Stata's factor-variable notation does not permit negative values (or, for that matter non-integer values, or even strings.) Factor variable notation is a convenient shorthand that tells Stata to create separate indicator variables ("dummies") for each value of the actual variable. The results will be the same regardless of what those values are--all that matters is which observations have the same values of the variable.

    It is not clear what you mean by "removing the negative values." You can replace them with missing values:
    Code:
    replace w5_nc_chadvac = . if w5_nc_chadvac < 0
    if you want to retain the observations information on other variables for use in other analyses. Or, if you really won't be using those observations for anything, you can just drop them altogether:
    Code:
    drop if w5_nc_chadvac < 0
    As for giving that yesno variable (coded as 1/2) value labels 0 and 1, I think you are confusing language. At least I hope you are, because taken literally, it is a really terrible idea. What you should do to make this variable (and any others like it) most useful in Stata is actuallychange the values to 1 and 0 and, optionally, apply "yes" and "no" as value labels.
    Code:
    recode g10 (2 = 0)
    label define yesno 0 "No" 1 "Yes"
    label values g10 yesno

    Comment


    • #3
      Yes, I believe I was confusing my language. I thought a value label was the value (1) assigned to the label (yes) haha. Thank you for your help.

      Comment


      • #4
        This needs a cross-reference to your previous thread https://www.statalist.org/forums/for...rical-variable in which the strong advice was to map to a variable with positive values. An exception might be if any negative code was really some kind of missing value which you might want to ignore.

        Comment

        Working...
        X