Omitted because of collinearity

Laurent Magon

Join Date: May 2015

Posts: 21
#1

Omitted because of collinearity

05 Jun 2015, 02:13

Good morning everyone,

I'm having some difficulties with a regression. I already posted several threads concerning my work (in internationnal trades) and what I'm trying to do.

I want to regress the logarith of trade with fixed effects using the OLS estimator.

My regression is presented as

Code:

regress lnv i.ij i.ik i.jk

here, ij, ik and jk correspond to dummy variables representating the country fixed effect, the exporter-sector fixed effect and the importer-sector fixed effect respectively.

While doing the regression, I encounter a lot of omitted results due to collinearity. But I don't see what I've done wrong.

For the information, I'm following the same regression model as Elsa Leromain and Gianluca Orice that you'll find HERE

You'll also find in attachment the dta file i'm using and the logfile of the regression.

Thank you for your help, it's an urgent matter.
Attached Files

5_06.smcl (352.5 KB, 1 view)

baci92_1995_999CC.dta (223.5 KB, 2 views)
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

05 Jun 2015, 02:36

Laurent:
I can replicate your problem.
Without being at all familiar with your reserch field, my absolutely beginner's opinion is that you have too many fixed effects.
Even if collineariity was not an issue and you obtained all the coefficients, I wonder how could you easily disseminate the results of your research.

Kind regards,
Carlo
(Stata 19.0)
Comment
Laurent Magon

Join Date: May 2015

Posts: 21
#3

05 Jun 2015, 02:46

Carlo:
How is it possible that you can't replicate it ? What do you mean by it.

The results of the regression will just help me to compute the revealed comparative advantage. I will use the values estimated of ik in order to find the fondamental productivity (Zik) of the country i in a sector k, and then find the RCA as it's described in the paper.

Unfortunately, I need that many fixed effect. The results are stange because the paper uses more countries than I do and thus has more fixed effects. I might be wrong, I'm not particularly good in econometrics.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

05 Jun 2015, 02:53

Laurent:
I wrote "I can replicate your problem", hence I can see your point.
As an aside, wouldn't it be feasible to e-mail the Authors of the paper you refer to and ask them the same questions?

Kind regards,
Carlo
(Stata 19.0)
Comment
Laurent Magon

Join Date: May 2015

Posts: 21
#5

05 Jun 2015, 03:05

Oh excuse me, I'm a bit tired this morning.

I already e-mailed the author but as I said, it's an urgent matter. This step is a short one and the work after the even easier, but being blocked at this step is frustrating.
this is why I came here hoping that someone might find a solution.

Do you think that it can be the result of my aggregation ? As I explained in another post, I decided to reduce the dataset by aggreagating some countries into a new "rest-of-the-world" one.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#6

05 Jun 2015, 03:20

Dear Laurent,

I guess I know what is going on. You are logging your dependent variable, which means that observations where the dependent variable is equal to zero are dropped. Therefore, dummy variables that are equal to 1 only when the dependent variable is zero will be identically zero in the sample used in the estimation. These will, of course, be dropped because of collinearity. I have discussed a similar problem in this paper. As an aside, it is a bad idea to estimate the model taking logs of a dependent variable that ca be zero; have a look here and here.

All the best,

Joao
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

05 Jun 2015, 03:29

Joao:
no obs have v=0 in Laurent's dataset.
I second your advice that taking logs of depvar that, in its original metric, can be zero is, in general, risky but this does not seem the main issue here.

Kind regards,
Carlo
(Stata 19.0)
Comment
Laurent Magon

Join Date: May 2015

Posts: 21
#8

05 Jun 2015, 03:41

Thank you for your advice, but I don't think that the issue is here. Indeed, the paper I'm following did quite the same and I didn't see any remarks where they got this collinearity problem.
The issue seems to occur with the last observed sector pair. I first thought that the sector 97 was not significant for what I need so I dropped it, but then the issue came with sector 96. I also removed the jk fixed effect to see if it change anything but

Basically what I'm saying is that the omitted variables are situated at the last sector observation per exporter/importer. I don't know if it can help you identify the problem.

Last edited by Laurent Magon; 05 Jun 2015, 03:45.
Comment
Martin Bresslein

Join Date: Apr 2014

Posts: 51
#9

05 Jun 2015, 04:09

Laurent,

Originally posted by Laurent Magon View Post

Basically what I'm saying is that the omitted variables are situated at the last sector observation per exporter/importer. I don't know if it can help you identify the problem.

If I understand your output correctly, that is pretty easy to explain: For all dummy variables, you need a base category. If you do not explicitly exlude one yourself - like the last sector-country dummy - then Stata will drop one arbitrarily, else they would be perfectly collinear. Thus, three dummy variables, one ik, one jk and one ij must be dropped by Stata to estimate the other dummy coefficients. These "dropped" observations would then be the base against which the RCA for all other country-sector pairs is built.

I am not sure why, in your output, there are more omitted ones, but it is possible that there are other collinearities.

Best,
Martin
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#10

05 Jun 2015, 04:41

Dear Carlo,

Thanks for pointing out my mistake; trade data without zeros is a rarity! Then, I guess this is just a case of having redundant dummies as you and Martin suggested and it may be related to the aggregation into the rest-of-the-world.

Cheers,

Joao
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#11

05 Jun 2015, 05:47

Joao:
I second your point.
Up to now I can't provide Laurent with further solution but reducing the number of fixed effects.

Kind regards,
Carlo
(Stata 19.0)
Comment
Laurent Magon

Join Date: May 2015

Posts: 21
#12

05 Jun 2015, 05:52

Thank you for your answer.

I tried to remove the jk importer-sector fixed effect but the result is nearly the same.

Regressing:

Code:

regress lnv i.ij i.ik

The regression thus give me 6 omitted variables this time: the last ik pair, meaning the sector 97 for each exporter country.

I can't see how I can reduce the number of fixed effect.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#13

05 Jun 2015, 08:12

The problem appears to be that your ij ik and jk variables result in a sparse data set.

Your data set has 86 distinct values of ij, 669 of ik, and 1248 of jk. That means there are, in principle, 71,802,432 triplets of these variables. But there are only 7,550 observations in the data set, so only a tiny fraction of these triplets are actually instantiated in the data. This means that, inevitably, there are massive collinearities among the indicators for these variables. That is to say, if you know any two of them, the probability is extremely high that there is only one possible value for the third one. That means that the corresponding indicators have a collinearity relationship.

I do not see any way out of this dilemma. Either Leromain and Orice had a much, much larger data set, or you have misinterpreted how they modeled their data. (I have not checked the article at the link.)
1 like
Comment
Laurent Magon

Join Date: May 2015

Posts: 21
#14

07 Jun 2015, 10:50

Good evening,
I don't think that my model is wrong. Thank you for your answers and sorry for my late replay.
Do you think that the values for ik can be used to compute the RCA or is it completely wrong ? I'm not at ease with this unfortunately.

I tried to mailed Elsa Leromain but I think she's too busy to reply quick enough.
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#15

07 Jun 2015, 21:32

Hi Laurent,

Did you code the dummy variables yourself? If so, could you post your do-file? It seems to me that this is most likely a small coding error. I am assuming that the "exporter" and "importer" categories are mutually exclusive. All of the dropped variables are from the exporter and importer categories, with the majority being from the latter. Are you confident that there are "plenty" of importers in the data?

Josh
Comment

Announcement

Omitted because of collinearity

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment