omitted for collinearity : No explicit collinearity or relationship between the variables entered

Chiara Tasselli

Join Date: Feb 2021

Posts: 111
#1

omitted for collinearity : No explicit collinearity or relationship between the variables entered

10 Jan 2023, 07:16

good morning everyone,
I am running regressions on enterprise data (microdata of 15000 workers linked to 7 enterprises). At the time of entering some dichotomous variables regarding presence or absence of enterprise policies it omits them all in bulk (with note written: var... omitted because of collinearity) . The variables are not connected to each other and also it does not remove only the first one (as is usually the case with complementary dummies to have a basis).
I have checked and the variables have variability (SD > 0 ) within the dataset, I cannot understand.
I do not know if the problem is that the variables do not vary for all workers related to the same firm... I cannot understand. Somebody could kindly help me?

Many thanks in advance for your time, wishing you all a great Tuesday ahead.
Tags: collinearity, collinearity issue

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17706

10 Jan 2023, 07:39

Chiara:
if your situation is similar to the following toy-example, omission is unavoidable:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage i.race, fe
note: 2.race omitted because of collinearity.
note: 3.race omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =     28,534
Group variable: idcode                          Number of groups  =      4,711

R-squared:                                      Obs per group:
     Within  = 0.0000                                         min =          1
     Between = 0.0050                                         avg =        6.1
     Overall =      .                                         max =         15

                                                F(0,23823)        =       0.00
corr(u_i, Xb) =      .                          Prob > F          =          .

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        race |
      Black  |          0  (omitted)
      Other  |          0  (omitted)
             |
       _cons |   1.674907   .0018961   883.35   0.000     1.671191    1.678624
-------------+----------------------------------------------------------------
     sigma_u |  .42456905
     sigma_e |  .32028665
         rho |  .63731204   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4710, 23823) = 8.44                 Prob > F = 0.0000

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Chiara Tasselli

Join Date: Feb 2021

Posts: 111
#3

10 Jan 2023, 07:45

Dear Carlo,
many thank for your reply. Honestly, I feel in a different situation : the dummy variables are not complementary or overlapping. Moreover, the firm fixed effects that are not dichotomous but continuous are displayed . I really cannot understand...
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#4

10 Jan 2023, 08:04

Chiara:
1) you ran a -fe- panel data regression: as expected, time-invariant variables are crunched by the -fe- estimator, whereas time-varying ones survive -fe-'s hunger;
2) you're experiencing an omission of categorical variables that are time-invariant for workers working in the same firm (say: the firm has a gym available for workers Yes/No).
If 1) and 2) are correct and you -xtset- your dataset with workers as -panelid-, the omission makes sense.

Kind regards,
Carlo
(Stata 19.0)
Comment
Chiara Tasselli

Join Date: Feb 2021

Posts: 111
#5

10 Jan 2023, 08:41

Many thanks Carlo for your suggestions, and I completely got the point of the invariant variables over time in panel data.

The fact is that I haven't a panel data. I simply have a cross-section: for firm FE I was intending firm-related variables that are therefore the same for all workers in the same firm. Surely I did not express myself correctly.

What I notice is that as long as I enter a few dichotomies, they work. But sometimes all it takes is to enter just one more and it omits me 3 of the block and if I put them all in (my guess is that as I enter variables state creates certain groupings that it perceives as complementary, even though in fact they are not: this is perhaps possible because I have few holdings and so the variability of the dummies is less than that of the microdata)... I don't know It still a great mystery to me...
Comment
George Ford

Join Date: Aug 2014

Posts: 3140
#6

10 Jan 2023, 08:47

Did you something like this?
1. run the model that gives you the error (say y x2 x2 x3).
2. correl y x1 x2 x3 if e(sample)
Comment
Chiara Tasselli

Join Date: Feb 2021

Posts: 111
#7

10 Jan 2023, 09:08

i tried pwcorr y x2 x2 x3, what the difference with correl and what "if e(sample)" stands for? Now I tried also your suggestions, but still no evidence of collinearity....
Comment
George Ford

Join Date: Aug 2014

Posts: 3140
#8

10 Jan 2023, 09:25

pwcorr and correl are similar, the former having more options.

if e(sample) restricts the data to the estimation sample from the regression so you know you're dealing with the same data that gives you the error.

try

mdesc y x1 x2 x3

to see if you have missing data causing the problem.

If the variable is a linear combination of other variables in the model you'll get the same error and it may not show up in the correlations. Is that a possibility?

regress the problem variable on the other X and see what happens.
Comment
Chiara Tasselli

Join Date: Feb 2021

Posts: 111
#9

10 Jan 2023, 10:20

many thanks for all your useful suggestions; I haven't missing data (already checked by looking at obs of summarize) . I will check for linear combinations. Many thanks for your time and suggestions!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#10

10 Jan 2023, 10:32

Chiara:
could you please provide an excerpt/example of your dataset (changing the name of variables if confidential) so that interested listers can challenge themselves with data instead of relying on guess-work? Thanks.

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1130
#11

10 Jan 2023, 14:21

Having never used -xtreg-, I was a bit puzzled by Carlo Lazzaro's example in #2. This UCLA page helped me. It also prompted me to try the following:

Code:

clear use "https://www.stata-press.com/data/r17/nlswork.dta" xtreg ln_wage i.race, fe xtreg ln_wage i.race, be xtreg ln_wage i.race, re xtreg ln_wage i.race mixed ln_wage i.race || idcode: estat icc

I'm sure there is nothing new here for -xtreg- veterans, but maybe other -xtreg- newbies will find it helpful.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
1 like
Comment

Announcement

omitted for collinearity : No explicit collinearity or relationship between the variables entered

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment