Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata takes the first group in categorical variables as the reference group?

    Hi, I happened to use a categorical variable without recoding it in a logit regression and found Stata actually treated group 1 as the reference group. And the results showed no difference from a recoded variable when group 1 is given the value 0. Is it just me who never knew that?

  • #2
    Your question would be clearer if you showed your code and output. But if the command is something like

    logit y i.x

    and x is binary, then it won’t matter if x is coded 0\1 or 1\2 or even 19\45. Because of the use of factor variable notation it will be treated as though it is coded 0\1.

    If this doesn’t answer you question then show us your code and output so we can see what you mean.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 17.0 MP (2 processor)

    EMAIL: rwilliam@ND.Edu
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thank you, Richard. That answered my question. I guess it applies to categorical variables as well.

      Comment


      • #4
        If you wish to change the reference level, you won't need to recode the variable. Just use "ib#." (# = number for the category) under factor notation.

        Please see the example below:

        Code:
        . sysuse auto
        (1978 Automobile Data)
        
        . codebook foreign
        
        ----------------------------------------------------------------------------------------------------------
        foreign                                                                                           Car type
        ----------------------------------------------------------------------------------------------------------
        
                          type:  numeric (byte)
                         label:  origin
        
                         range:  [0,1]                        units:  1
                 unique values:  2                        missing .:  0/74
        
                    tabulation:  Freq.   Numeric  Label
                                    52         0  Domestic
                                    22         1  Foreign
        
        . regress mpg i.foreign
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(1, 72)        =     13.18
               Model |  378.153515         1  378.153515   Prob > F        =    0.0005
            Residual |  2065.30594        72  28.6848048   R-squared       =    0.1548
        -------------+----------------------------------   Adj R-squared   =    0.1430
               Total |  2443.45946        73  33.4720474   Root MSE        =    5.3558
        
        ------------------------------------------------------------------------------
                 mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
             foreign |
            Foreign  |   4.945804   1.362162     3.63   0.001     2.230384    7.661225
               _cons |   19.82692   .7427186    26.70   0.000     18.34634    21.30751
        ------------------------------------------------------------------------------
        
        . regress mpg ib1.foreign
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(1, 72)        =     13.18
               Model |  378.153515         1  378.153515   Prob > F        =    0.0005
            Residual |  2065.30594        72  28.6848048   R-squared       =    0.1548
        -------------+----------------------------------   Adj R-squared   =    0.1430
               Total |  2443.45946        73  33.4720474   Root MSE        =    5.3558
        
        ------------------------------------------------------------------------------
                 mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
             foreign |
           Domestic  |  -4.945804   1.362162    -3.63   0.001    -7.661225   -2.230384
               _cons |   24.77273   1.141865    21.69   0.000     22.49646    27.04899
        ------------------------------------------------------------------------------
        Last edited by Marcos Almeida; 04 Sep 2019, 09:49.
        Best regards,

        Marcos

        Comment


        • #5
          Thanks, Marcos. But the variable foreign is already coded as [0 1]. In the data I use, variables are coded as [1 2...], and I usually recode them to [0 1 ...]. But I just found there is no such need.

          Comment


          • #6
            The same principle showed in #4 with a binary variable applies to categorical variables. Just change the variable "foreign" to "rep78" and use the "ib#." notation. As we can see, still no need to recode at all.
            Best regards,

            Marcos

            Comment


            • #7
              I tried using the notation and I figured "ib#." is to set the reference category. I guess that way we don't need to recode variables, because you can just set whichever group as the reference.
              Thanks for teaching me about this new notation.

              Comment

              Working...
              X