Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Independent categorical variable in mutliple linear regression

    Dear all,

    For my master thesis I am building a linear regression model with a continous dependent variable (satisfaction_score) and three independent variables (age/continous, country/categorical, device/categorical). You can find my code below. Country has 4 categories Cambodia (1), Ethiopia (2), South Sudan (3), Uganda (4) and device has five categories Contec (1), Devon (2), Lifebox (3), Utech (4), Masimo (5). I have the problem, that i cannot get rid of a baseline level for the categorical variables so they are always compared to Cambodia and Contec. I have tried the bn command, but with this the last categories Uganda (4) and Masimo (5) are automatically omitted. Is there another way to do it and why are the last categories omitted with ibn.?

    This is my code
    Code:
    regress Mean_Score ibn. country ibn. device hwage
    Thanks in advance,

    Theresa

  • #2
    Regression coefficients are by their very nature comparisons*, so when you look at a categorical variable, then you have to decide with whom you want to compare. So a reference category is inevitable. There are ways around that, but you should not do that unless you really know what you are doing, as it changes the meaning of the coefficients. You could look at help contrast to get other comparisons.

    * as always exceptions exist
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Resa,
      an as aside to Maarten's helpful advice, you can use -lincom- to get what you're presumably after:
      Code:
      . use "https://www.stata-press.com/data/r17/auto.dta"
      (1978 automobile data)
      
      . regress price i.foreign i.rep78 if rep78>=4
      
            Source |       SS           df       MS      Number of obs   =        29
      -------------+----------------------------------   F(2, 26)        =      0.51
             Model |  4454354.32         2  2227177.16   Prob > F        =    0.6071
          Residual |   113826247        26  4377932.56   R-squared       =    0.0377
      -------------+----------------------------------   Adj R-squared   =   -0.0364
             Total |   118280601        28  4224307.17   Root MSE        =    2092.4
      
      ------------------------------------------------------------------------------
             price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
           foreign |
          Foreign  |   835.4296   844.6543     0.99   0.332    -900.7821    2571.641
           5.rep78 |  -424.3185   844.6543    -0.50   0.620     -2160.53    1311.893
             _cons |   5653.785   649.2909     8.71   0.000     4319.149    6988.422
      ------------------------------------------------------------------------------
      
      . mat list e(b)
      
      e(b)[1,5]
                  0b.          1.         4b.          5.           
             foreign     foreign       rep78       rep78       _cons
      y1           0   835.42963           0  -424.31852   5653.7852
      
      * via -lincom- you can compare -domestic- with -4rep.78- (=_cons) vs. -foreign- with -5rep.78-*
      . lincom _b[_cons] +  _b[4.rep78] + _b[0.foreign] -  _b[_cons] + _b[1.foreign] + _b[5.rep78]
      
       ( 1)  0b.foreign + 1.foreign + 4b.rep78 + 5.rep78 = 0
      
      ------------------------------------------------------------------------------
             price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               (1) |   411.1111   986.3437     0.42   0.680    -1616.347     2438.57
      ------------------------------------------------------------------------------
      
      .
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thanks for the quick answers that helps! I was then now wondering, if I compare to one country/device in the regression, is there a command to set the base level to the country/device with the highest mean of dependant variable?

        Comment


        • #5
          Resa:
          not that I know.
          If you find that approach fruitful and methodologically sound, you can first calculate the mean yourself and then set the level of categorical variable the highest mean refers to as the reference category vis -fvvarlist- notation.
          Last edited by Carlo Lazzaro; 05 May 2022, 03:18.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Alright! Thanks a lot!

            Comment

            Working...
            X