Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative values for categorical variable.

    Hello,

    I have been using R but started learning STATA recently. To briefly introduce what I have been trying to do, I have some dataset, which I received as a homework in my statistics course last year. By reusing the dataset, I have been trying to rerun in the course by using STATA. That said, I already know what statistical results I should get, which makes easier for me to see whether I am going to the right direction.


    (1) I am trying to regress two categorical variables and the interactions of the two on one dependent variable, as in "reg DV IV1##IV2."

    The code I am trying to use are, "reg Score Condition##Experience" and "anova Score Condition##Experience."

    Initially, two independent variables are categorical, coded as 0 and 1.


    However, I have learned in my statistics class that centering categorical variables is always useful for interpretation purpose when I regress interactions. Thus, I recoded 0 and 1 into -1 and 1 and tried rerunning the code (by recoding the values, the mean of the variable is 0 implying that the variables are correctly centered). The new code I used is, "reg Score ConditionC##ExperienceC" and "anova Score ConditionC##ExperienceC." The alphabet C just indicates that the variables are centered. However, when I tried rerunning the code, I got the error saying "ConditionC: factor variables may not contain noninteger values." I further tried putting "i." in front of the centered variables as in, "reg Score i.ConditionC##i.ExperienceC." However, the code did not still work, and I still got the same error message.

    Based on the above and by searching through the forum, I got this first question and would like to confirm my understanding: "Am I not allowed to have negative values in a factor variable? My rationale for this question is simple. To me, both 0 and 1, and -1 and 1 can imply YES or NO. However, STATA seems not to allow negative values in a categorical variable. Am I correct?


    (2) Again, by searching through the forum, I changed my code by putting 'c.' in the front. For example, "reg Score c.ConditionC##c.ExperienceC." Then this worked well, and I got the statistical result I should have gotten (As I mentioned, I already have the HW key, so compared my STATA result with the HW Key).

    Here comes my second question. If I put 'c.' in front my categorical variable, which is coded as -1 and 1, how does STATA interpret the code? Simply as a continuous variable? How can it interpret a categorical variable as a continuous variable?


    Thank you for your help in advance!
    Last edited by Seung Kyo Ahn; 28 Jul 2020, 01:44.

  • #2
    Welcome to Stata, and you will occasionally have to learn how to deal with fortitude with issues of this sort, it will be worthy. Stata is an excellent language, the best out there (and this is why I am using it), but Stata is also rather patronising in that standard things are disallowed because they "do not make sense" (that is, the Stata programmer felt he knows better than me what is good for me).

    In econometrics we do not use weird recodings of categorical variables, so I did not know this myself, but yes, apparently
    Code:
    x:  factor variables may not contain negative values
    r(452);
    Note that these factor variables
    Code:
     help fvvarlist
    are relatively modern addition to Stata (I think they were added in Stata 11) so half of my life as a Stata user I lived without these factor variables. You can always forego the factor variable facility, and generate yourself whatever variables you want like we did pre-Stata 11.

    Lying to Stata that the variable is continuous by c.var is something about which I have a bad feeling, nothing good would come out of this. (Stata considering a variable continuous simply means that Stata is not aware of the special structure of the categorical variable.)

    Comment


    • #3
      Seung:
      - as per FAQ, please post what you typed an what Stata gave you back (via CODE delimiters). Thanks;
      - why bothering yourself with -anova- when -regress- can do better (especilly as far as the postestimation suite of commands is concerned)?;
      - while Joro is obvioulsy correct in that you can create interactions on the fly by hand, please note that via -fvvarlist- notation you have a dedicated lane to two wonderflu Stata commands, such as -margins- and -marginsplot-.
      Eventually, I do share Joro's helpful advice about do not try to cheat Stata with wrong -fvvarlist- prefixes, as nonsensical results will be returned.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X