Dummy variables regression- avoiding multicollinearity

Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#1

Dummy variables regression- avoiding multicollinearity

07 Mar 2016, 08:41

I am currently running a completely flexibile regression, i.e a regression of y only on dummy variables. It is of very high dimension (more than 200 dummy variables). A snapshot of my data looks like the following:

Code:

clear input float(Y D1 D2 D3 D4 D5 D6) 1 1 0 0 0 0 0 2 0 1 0 0 0 0 3 1 0 0 0 0 0 4 0 0 0 0 1 0 56 1 0 0 0 0 0 1 0 0 0 0 0 1 21 0 0 1 0 0 0 end

where Y is the regressand and D1 is a full set of dummy variables that go on till D200. I have thousands of observations. To avoid the dummy variable trap, I have dropped the constant. Something strange happens- when I force the constant to be dropped, the coefficients are identified, as (X'X) is a full rank matrix. However, when I estimate the model with FE and no constant, it still drops one variable! The thing is, because there are so many observations, even when I drop a variable, the determinant is still close to 0. But the only time it should drop a variable is when the determinant is exactly 0. Any ideas? Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

07 Mar 2016, 09:22

I can't quite follow from your explanation what is happening. Why don't you show us the exact regression commands you are using and the exact output you are getting from Stata. Please do this by copying directly from Stata's Results window or your log file and pasting into a code file. Please do not edit any of it: the details are usually crucial.
1 like
Comment
Chinmay Sharma

Join Date: Nov 2015

Posts: 351
#3

07 Mar 2016, 12:05

Thanks for your response. Unfortunately, the output is quite large. I will try to figure out the core of the problem, and then post it. Thanks!
Comment

Announcement

Dummy variables regression- avoiding multicollinearity

Comment

Comment