Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collinearity of dummies in OLS

    I have data for wages of individuals into 9 groups depending on the occupation - service, production and etc. I have generated dummy variables for each group to see the effect of being into that group on the wages. However, when I do the regression, one of the group drops because of collinearity. Dropping the variable from the regression works and does not change the coefficients of the other variables but I am worried because I want to see the effect of the dropped variable. Is that possible?

  • #2
    No, it's not possible. This is not some peculiarity of Stata: the same will happen in any statistics package, because it is a real statistical phenomenon. If you include all 9 indicators ("dummies") and a constant term, then there is the relationship that the constant term is always equal to the sum of the 9 indicators. So the effects are inherently unidentified. You could arbitrarily add any amount to any one of the effects and then compensate for that by appropriately adjusting the constant term and the other indicators. So these effects are actually undefined.

    But probably you don't really want to see those effects anyway. You can see the expected wage in each group. It is easier to do this if you use factor-variable notation in your regression. So throw away those 9 indicator variables, and rerun your regression:

    Code:
    regress wage i.group
    where group is the variable that takes on values 1, 2, 3, 4, 5, 6 , 7, 8, 9 in the 9 different groups. The regression will create "virtual" indicator variables for the 9 groups, and then omit one. Now run

    Code:
    margins group
    and you will see the expected wage in all 9 groups.

    Comment


    • #3
      Thank you for your reply. I appreciate it. I would also like to do propensity score matching with that dataset - matching individuals with similar education with a treatment dummy variable on the wages. The observations are more than 2 million so I had to use psmatch2 command. However I do not fully understand what the results mean.


      Code:
      psmatch2 intHK graduate college some_college HSgrad, out(lnwage)
      Code:
      Probit regression                                 Number of obs   =    2930587
                                                        LR chi2(4)      =    1407.47
                                                        Prob > chi2     =     0.0000
      Log likelihood = -164300.45                       Pseudo R2       =     0.0043
      
      ------------------------------------------------------------------------------
             intHK |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          graduate |   .2518609   .0102442    24.59   0.000     .2317826    .2719392
           college |   .2141908   .0094347    22.70   0.000     .1956993    .2326824
      some_college |   .1876027   .0095926    19.56   0.000     .1688016    .2064039
            HSgrad |   .0462249   .0100449     4.60   0.000     .0265372    .0659126
             _cons |  -2.486882   .0086314  -288.12   0.000    -2.503799   -2.469964
      ------------------------------------------------------------------------------
      There are observations with identical propensity score values.
      The sort order of the data could affect your results.
      Make sure that the sort order is random before calling psmatch2.
      
      ----------------------------------------------------------------------------------------
              Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
      ----------------------------+-----------------------------------------------------------
                lnwage  Unmatched | 10.7896217    10.712522   .077099664   .005203826    14.82
                              ATT | 10.7896217   10.8712186  -.081596886   .310888536    -0.26
      ----------------------------+-----------------------------------------------------------
      
                 | psmatch2:
       psmatch2: |   Common
       Treatment |  support
      assignment | On suppor |     Total
      -----------+-----------+----------
       Untreated | 2,901,088 | 2,901,088
         Treated |    29,499 |    29,499
      -----------+-----------+----------
           Total | 2,930,587 | 2,930,587
      Last edited by Radoslav Velev; 24 Apr 2017, 11:05.

      Comment


      • #4
        First, this is unrelated to your original topic. So you should post this in a new thread. That way others who might be interested in this matching issue, but not in the originao colinearity question, will see it, and in the future others will find it in searches.

        Comment

        Working...
        X