Collinearity of dummies in OLS

Radoslav Velev

Join Date: Apr 2017

Posts: 12
#1

Collinearity of dummies in OLS

15 Apr 2017, 10:23

I have data for wages of individuals into 9 groups depending on the occupation - service, production and etc. I have generated dummy variables for each group to see the effect of being into that group on the wages. However, when I do the regression, one of the group drops because of collinearity. Dropping the variable from the regression works and does not change the coefficients of the other variables but I am worried because I want to see the effect of the dropped variable. Is that possible?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

15 Apr 2017, 10:49

No, it's not possible. This is not some peculiarity of Stata: the same will happen in any statistics package, because it is a real statistical phenomenon. If you include all 9 indicators ("dummies") and a constant term, then there is the relationship that the constant term is always equal to the sum of the 9 indicators. So the effects are inherently unidentified. You could arbitrarily add any amount to any one of the effects and then compensate for that by appropriately adjusting the constant term and the other indicators. So these effects are actually undefined.

But probably you don't really want to see those effects anyway. You can see the expected wage in each group. It is easier to do this if you use factor-variable notation in your regression. So throw away those 9 indicator variables, and rerun your regression:

Code:

regress wage i.group

where group is the variable that takes on values 1, 2, 3, 4, 5, 6 , 7, 8, 9 in the 9 different groups. The regression will create "virtual" indicator variables for the 9 groups, and then omit one. Now run

Code:

margins group

and you will see the expected wage in all 9 groups.
1 like
Comment

Radoslav Velev

Join Date: Apr 2017
Posts: 12

24 Apr 2017, 11:01

Thank you for your reply. I appreciate it. I would also like to do propensity score matching with that dataset - matching individuals with similar education with a treatment dummy variable on the wages. The observations are more than 2 million so I had to use psmatch2 command. However I do not fully understand what the results mean.

Code:

psmatch2 intHK graduate college some_college HSgrad, out(lnwage)

Code:

Probit regression                                 Number of obs   =    2930587
                                                  LR chi2(4)      =    1407.47
                                                  Prob > chi2     =     0.0000
Log likelihood = -164300.45                       Pseudo R2       =     0.0043

------------------------------------------------------------------------------
       intHK |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    graduate |   .2518609   .0102442    24.59   0.000     .2317826    .2719392
     college |   .2141908   .0094347    22.70   0.000     .1956993    .2326824
some_college |   .1876027   .0095926    19.56   0.000     .1688016    .2064039
      HSgrad |   .0462249   .0100449     4.60   0.000     .0265372    .0659126
       _cons |  -2.486882   .0086314  -288.12   0.000    -2.503799   -2.469964
------------------------------------------------------------------------------
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.

----------------------------------------------------------------------------------------
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
----------------------------+-----------------------------------------------------------
          lnwage  Unmatched | 10.7896217    10.712522   .077099664   .005203826    14.82
                        ATT | 10.7896217   10.8712186  -.081596886   .310888536    -0.26
----------------------------+-----------------------------------------------------------

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-----------+-----------+----------
 Untreated | 2,901,088 | 2,901,088
   Treated |    29,499 |    29,499
-----------+-----------+----------
     Total | 2,930,587 | 2,930,587

Last edited by Radoslav Velev; 24 Apr 2017, 11:05.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

24 Apr 2017, 17:47

First, this is unrelated to your original topic. So you should post this in a new thread. That way others who might be interested in this matching issue, but not in the originao colinearity question, will see it, and in the future others will find it in searches.
Comment

Announcement

Collinearity of dummies in OLS

Comment

Comment

Comment