Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omitted because of collinearity

    I am running a regression with 5 dummy variables and 6 non-dummy variables. One of the dummy variables is omitted by STATA due to collinearity, but there is no reason for there to be collinearity between this dummy and the other variables.
    Why is this happening, and is there a way to force the regression through?

    I read somewhere that I could attempt to regress the omitted dummy on the other variables to see which one is causing the problem, but that just caused them all to be omitted.

  • #2
    Hi Birk. Could you perhaps provide a reproducible snippet of your code so we can see what's exactly going on? As per FAQ12, please use Code delimiters for your example.

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int DAYS float(PREMIUM A_CAR RELSIZE DEBTFIN FINCOMP CASH ACQSIZE POISON TOEHOLD TPUB TTERMF SAMEIND)
      122      .           .          .  . 0 0         . 0 . 0          . 0
      157      .           .          .  . 0 0         . 0 . 1          . 0
      118  92.31    -.766689          .  . 0 1         . 0 . 1          . 0
      365  42.47   1.1065979  .50111026  . 0 0  5.508538 0 1 1          . 0
      106  30.16           .          .  . 0 0         . 0 . 1          . 0
       95  47.04    .7577015 .030937005  . 0 0  8.503799 0 . 1  .02948364 0
      170   2.82  -.09509114          .  . 0 0         . 0 . 1          . 1
      227   7.74           .          .  . 0 1         . 0 . 1          . 0
       86  43.99     -.96329          .  . 0 0    9.4846 0 . 1          . 0
      132  40.68   -.1654657          .  . 0 1         . 0 . 1   .0511373 0
      278  20.44   .27428758   .3258687  . 0 0  5.679046 0 . 1  .05242629 0
      190      .           .          .  . 0 0         . 0 . 0          . 1
      106      .    .4087988          .  . 0 1  9.912284 0 . 0          . 0
       48      .           .          .  . 0 0         . 0 . 0          . 0
      175      .           .          .  . 0 0         . 0 . 1          . 0
      200  52.27   .17756623          .  . 0 0         . 0 . 1          . 0
      282  19.65  -.26675332          .  . 0 1    11.639 0 . 1          . 0
      191      .    -.417715          .  . 0 1         . 0 1 1          . 0
       34  26.32  .005595885  .05107186  . 0 0   4.30636 0 . 1          . 0
        0      .           .          .  . 0 0         . 0 . 1          . 0
      209  45.77           .          .  . 0 0         . 0 . 1          . 0
      308     45   .08646646          .  . 0 1         . 0 . 1          . 0
      153      .    .9253797 .016575342  . 0 0  5.389072 0 . 1          . 0
       69     .3   .10768275  1.5573744  . 0 1  2.884801 0 . 1          . 0
      130  19.27  -.09125938          .  . 0 0  9.570786 1 . 1          . 0
      322   3.79   .25286093          .  . 0 0         . 0 . 1          . 1
       74   6.17   -.8066383   .4916679  . 0 0  5.585149 0 . 1          . 0
       51 133.33           .          .  . 0 1         . 0 . 1          . 0
      218  80.19   -.6416278          .  . 0 0         . 0 . 1          . 0
        0      .    .9180825          .  . 0 0         . 0 . 1          . 1
        0      .           .          .  . 0 0         . 0 . 1          . 1
      189      .           .          .  . 0 0         . 0 . 1 .009709157 0
        0      .           .          .  . 0 1         . 0 . 1          . 0
      321  20.51   -.7454169          .  . 0 1         . 0 . 1          . 0
      429  39.02           .          .  . 0 0         . 0 1 1          . 1
      311  41.73   -.5195982          .  . 0 0         . 0 . 1          . 1
        0      .           .          .  . 0 0         . 0 . 1          . 0
      152  75.44           .          .  . 0 1         . 0 . 1          . 1
       76      .           .          .  . 0 0         . 0 . 0          . 1
      189  31.35  .004368234          .  . 0 0         . 0 . 1  .04941444 0
      228  53.85           .          .  . 0 0         . 0 . 1          . 0
        0  15.47   -.4448323          .  . 0 1         . 0 . 1          . 0
        0      .           .          .  . 0 0         . 0 . 1          . 0
       89  33.87   -.3437342  2.2696342  . 0 0   4.25618 0 . 1          . 0
      161  15.84   -.7123645          .  . 0 0         . 0 . 1          . 0
        .  70.11           .          .  . 0 1         . 0 0 1          . 0
       97    6.8  -.07244786          .  . 0 0         . 0 . 1          . 0
      238  53.85           .          .  . 0 1         . 0 . 1 .013856813 0
      429  35.64  -.05753836          .  . 0 1         . 0 . 1          . 0
      169  89.77   .56141067          .  . 0 0         . 0 . 1          . 1
      106      .     .686803  1.0161734  . 0 0  5.926099 0 . 0          . 1
      125      .           .          .  . 0 0         . 0 . 1          . 0
       84  34.48   -.3530126          .  . 0 1  7.027643 0 . 1          . 0
        .    2.2           .          .  . 0 1         . 0 0 0          . 0
       75     44   -.7838489  2.3918035  . 0 1  2.144761 0 . 1          . 0
      154 136.16  -.17591253   .3097904  . 0 1  4.368162 0 . 1  .04009492 0
      197  38.35    .4272529          .  . 0 0         . 0 . 1          . 1
       11    100   .04914184  1.4864864  . 0 1 4.3993754 0 . 1          . 0
      130  86.96    .7402366  .14585118  . 0 0  5.635075 0 1 1 .024479805 1
        0      .     .950442 .031062555  . 0 1  5.635075 0 . 1          . 1
      274  63.56           .          .  . 0 0         . 0 . 1          . 1
      683  93.79           .          .  . 0 1         . 0 1 1          . 0
        .      .           .          .  . 0 1         . 0 1 1          . 0
        .  11.54           .          .  . 0 1         . 0 1 1          . 0
      106      .           .          .  . 0 0         . 0 . 1          . 0
        0      .           .          .  . 0 0         . 0 . 0          . 0
      161   16.3           .          .  . 0 0  8.563246 0 . 1          . 0
      113      .   -.6746885          .  . 0 0         . 0 . 1          . 0
      161      .           .          .  . 0 0         . 0 . 1          . 1
      253  85.52           .          .  . 0 1         . 0 1 1          . 0
      125      .           .          .  . 0 0         . 0 . 0          . 0
      134  44.34   .11530685          .  . 0 0   6.96979 0 0 1          . 1
       35  97.79   -.9504567          .  . 0 1         . 0 . 1          . 0
        0      .           .          .  . 0 0         . 0 . 0          . 1
      122  36.54  -.29253453   .0372837  . 0 1  9.614017 0 . 1  .03582586 0
       12      .   -.5464296   .1887238  . 0 0  6.677272 0 . 1          . 0
       98      .   -.4690922  18.528427 .5 0 0  2.887033 0 . 1          . 0
      178   7.91   -.4898624          .  . 0 0         . 0 . 1          . 0
       42      .   -.0763758          .  . 0 0         . 0 . 1          . 0
       86      .           .          .  . 0 0         . 0 0 1          . 0
      167     50           .  .11144508  . 0 0  6.604716 0 . 1 .036447577 0
       50      . -.073416926          .  . 0 0         . 0 . 0          . 0
       46 204.23  -.20878157          .  . 0 1         . 0 . 1          . 0
      108      .   .19655837          .  . 0 0         . 0 . 0          . 0
        7      .           .          .  . 0 1         . 0 . 1          . 1
      115  18.93    .5102478          .  . 0 1         . 0 . 1          . 0
       61      .           .          .  . 0 0         . 0 . 0          . 0
       68 109.57           .          .  . 0 1         . 0 1 1          . 0
       69  58.97   -.1165596          .  . 0 1  6.609349 0 . 1          . 1
       57  21.21   1.1214056          .  . 0 0  9.885895 0 . 1          . 1
      181      .    .4661484  .04818514  . 0 1  7.008641 0 . 1          . 0
      184  56.39           .          .  . 0 0         . 0 . 1          . 1
        .    6.3           .          .  . 0 1         . 0 . 1          . 0
      102      .   .19121815          .  . 0 0         . 0 . 0          . 0
        0      .           .          .  . 0 0         . 0 . 0          . 1
       32  36.47           .          .  . 0 1         . 0 1 1          . 0
      236  69.01           .          .  . 0 0         . 0 . 1          . 0
       70     .9           .          .  . 0 1         . 0 . 1          . 0
       73     50           .          .  . 0 1         . 0 0 1          . 0
      198   8.77  -.46990025          .  . 0 0         . 0 . 1          . 0
      end
      I am regressing A_CAR (Acquiror Cumulative Abnormal Returns) on a series of acquisition deal characteristics
      Last edited by Birk Haugan; 26 Aug 2021, 05:01.

      Comment


      • #4
        Right from the top of my head, I have a few questions:
        1. is this snippet representative of your data at large? If so, it looks like you have an awful lot of missing values!
        2. what kind of regression model are you using - ordinary least squares (OLS), I guess?
        3. Could you show your commands rather than just your data, please? Otherwise it's hard to guess what you're doing.
        Independent of the questions above, here are some suggestions to check for multicollinearity (as it seems you're experiencing issues with that):
        • immediately after your regression command, run the command "vif".
        • alternatively, use the command "collin" followed by your dependent variable and your independent variables, like so
        Code:
        collin A_CAR independent_1 independent_2 independent_3
        I hope this helps.

        Comment


        • #5
          I'd add plotting the predictors against each other in a scatter plot matrix.

          Comment


          • #6
            In your example data FINCOMP is always 0. For this data, FINCOMP would always be omitted from the model, because it doesn't provide any information and is collinear with the constant term, and if you regressed FINCOMP on the other four variables, all of them will be omitted because the correlation of a constant and another variable is not defined.
            Last edited by William Lisowski; 26 Aug 2021, 07:37.

            Comment


            • #7
              This is just to stress that this is not a problem, except in so far as it challenges your expectations. Stata is telling you that a predictor can and indeed must be omitted. That is good news and not a problem that requires or even allows force or any kind of work-around as a solution.

              Comment


              • #8
                Click image for larger version

Name:	Screen Shot 2021-08-28 at 15.24.26.png
Views:	1
Size:	72.0 KB
ID:	1625226


                There are a lot of missing values, yes. The dataset is is 2 892 observations while I only have enough values for 146 observations in a regression.

                To explain the context a bit more, my observations are company acquisitions. The dependent variable is the cumulative abnormal stock return of the acquiring firm, and the independent variables are as follows:
                - DAYS: Number of days from acquisition bid announcement to completition
                - FINCOMP: Dummy for whether a financial acquirers was involved in the bidding round, or whether there were only corporate acquirers competing for the target
                - CASH: Dummy for whether the transaction was all-cash
                - ACQSIZE: the log value of the acquirer's asset market value
                - RELSIZE: the relative size of the transaction value to the acquirer's market value of assets
                - POISION: dummy for whether there was a poison pill defence mechanism installed in the target firm
                - TPUB: dummy for whether the target was a publicly listed firm
                - TTERMF: the termination fee of the transaction
                - SAMEIND: dummy for whether the target and acquirer are in the both industry

                I don't see why there would be a multicollinearity issue with the poison dummy.

                Here are the VIF-results:
                Click image for larger version

Name:	Screen Shot 2021-08-28 at 15.34.47.png
Views:	1
Size:	40.0 KB
ID:	1625227

                Comment


                • #9
                  And indeed collin gives an error message that there are zero values on the diagonal of the correlation matrix, so clearly something is wrong.

                  Consider the following toy example.
                  Code:
                   set obs 1000
                  Number of observations (_N) was 0, now 1,000.
                  
                  . generate x1 = rnormal()
                  
                  . generate x2 = 1
                  
                  . corr
                  (obs=1,000)
                  
                               |       x1       x2
                  -------------+------------------
                            x1 |   1.0000
                            x2 |        .        .
                  
                  
                  . // net install collin, from(https://stats.idre.ucla.edu/stat/stata/ado/analysis)
                  . collin x1 x2
                  (obs=1,000)
                  corr(): matrix has zero or negative values on diagonal
                  r(504);
                  This suggests there are two things wrong in what you have reported.
                  1. Your variable POISON is constant for the 146 observations you have, as I suggested could be the case in post #6; thus it is collienar with the constant
                  2. The collin program erroneously describes a missing value on the diagonal of the correlation matrix as zero or negative

                  Comment


                  • #10
                    Birk:
                    as far as I can elaborate on your previous -dataex- excerpt:
                    1) -POISON- has only 1 out of 99 0 values (basically, it's almosty a constant);
                    Code:
                    . tab POISON
                    
                         POISON |      Freq.     Percent        Cum.
                    ------------+-----------------------------------
                              0 |         99       99.00       99.00
                              1 |          1        1.00      100.00
                    ------------+-----------------------------------
                          Total |        100      100.00
                    2) at the top of that, when -POISON-=1, the observation is listwise deleted due to missing values in other variables for the same observation (hence it becomes a constant):
                    Code:
                    . list if POISON==1
                    
                         +-------------------------------------------------------------------------------------------------------------------------+
                         | DAYS   PREMIUM       A_CAR   RELSIZE   DEBTFIN   FINCOMP   CASH    ACQSIZE   POISON   TOEHOLD   TPUB   TTERMF   SAMEIND |
                         |-------------------------------------------------------------------------------------------------------------------------|
                     25. |  130     19.27   -.0912594         .         .         0      0   9.570786        1         .      1        .         0 |
                         +-------------------------------------------------------------------------------------------------------------------------+
                    
                    .
                    Hence, it does not seem that quasi-extreme multicollinearity (for what it worths; see chapter 23 of https://www.hup.harvard.edu/catalog....40&content=toc) plays any relevant role here.
                    It is rather a matter of managing missing values (if feasible).
                    Last edited by Carlo Lazzaro; 28 Aug 2021, 07:53.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment

                    Working...
                    X