Help with collinearity in STATA

Samuel Malkun

Join Date: Nov 2022

Posts: 12
#1

Help with collinearity in STATA

13 Aug 2023, 12:28

Hi,

I'm working on my thesis and I'm estimating a gravity model of trade. I'm using time-varying importer and exporter fixed-effects with a PPMLHDFE estimation. I have 180k observations spanning from 1948 to 2019. This is my code:

use "Temporales/falsofinal2.dta", clear

keep if indicator==0 & dupindicator2==0

keep if region_o=="south_america" | region_o=="central_america" | countryname=="Mexico"

gen l_dist = ln(distance)

ppmlhdfe TXGFOB gdp_2 l_dist contiguity landlocked island common_language common_colonizer common_legal_origin member_wto_joint agree_fta agree_cu agree_eia agree_psa, absorb(countrycode1#year countrycode2#year) d

I'm interested in the variable GDP, which is a multiplication of both countries GDP. As you can see the coefficient is close to zero because the values are so big. Literature estimates this coefficient should be close to unity when done in logarithm. I'm not sure if this coefficient can be interpreted as such since e^(-1.21e-26) is approximately +1. Instead, I want to use logarithm for this coefficient, so I do the following:

gen l_gdp = ln(gdp_2)

ppmlhdfe TXGFOB l_gdp l_dist contiguity landlocked island common_language common_colonizer common_legal_origin member_wto_joint agree_fta agree_cu agree_eia agree_psa, absorb(countrycode1#year countrycode2#year) d

However, STATA omits the new variable because of perfect collinearity with the fixed effects.

I don't understand how applying logarithms can make the same variable collinear with fixed-effects when it wasn't before.

Am I doing something wrong here?

I would appreciate your help.
Tags: None
Samuel Malkun

Join Date: Nov 2022

Posts: 12
#2

13 Aug 2023, 12:31

Clyde Schechter I would appreciate your help with this.
Comment
Samuel Malkun

Join Date: Nov 2022

Posts: 12
#3

13 Aug 2023, 12:32

Joao Santos Silva I would appreciate your help with this.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#4

13 Aug 2023, 13:00

A variable is colinear with the fixed effects if its value can be written as a linear combination of the fixed effects. So if we call the fixed effects for country1#year and country2#year u1t and u2t in year t, respectively, a variable x will be colinear with them if we can write x = a1*u1t + a2*u2t + a0 in all observations for some constants a1, a2, and a0. If we had a variable, let me call it gdp1, that was just the gdp of country 1, then it wold be colinear because we would have gdp1 = gdp1*u1t + 0*u2t + 0 in all observations.

Now, the variable gdp_2, which you defined as the product of the GDPs of countries 1 and 2, cannot be written in this way, and it is not colinear with u1t and u2t. But, when you take log(gdp_2), you have log(gdp_2) = log(gdp country1 * log gdp country2) = log(gdp country1) + log(gdp country2). So this variable can be decomposed as log(gdp_2) = log(gdp country1)*u1t + log(gdp country2)*u2t + 0. So it is colinear. It is the product rule of logarithms that has created the colinearity.

I'm not sure why you sought to log transform this variable in the first place. It appears you were concerned about the very small magnitude of the coefficient you got for gdp_2. But given that national GDPs are going to be in the many billions or even trillions of dollars and you are multiplying two such numbers together, your variable has a scale is something on the order of 10²⁰. So its coefficient is going to be very small. If you would like the coefficient to be a "nicer" number, the best way to do that is to rescale gdp_2. Instead of using GDP measured in dollars (or Euros, or whatever it was), use it scaled in terms of trillions of dollars. This will rescale the variable gdp_2 to something on the order of 10², and your coefficient will be nicer.

If you were log transforming for other reasons (repair non-linearity in the relationship to outcome, or to decrease variation in the GDP measure), well, all I can say about that is you will have to choose another way of doing that. I am not an economist/econometrician, so I will not pretend to advise you on which ways of dealing with these problems would be most appropriate for this situation.
Comment
Samuel Malkun

Join Date: Nov 2022

Posts: 12
#5

13 Aug 2023, 13:28

Thank you very much for your response. I now understand why the log of the product of GDPs is perfectly collinear with my fixed effect. I wanted to use logs because the interpretation of the coefficient is a direct elasticity, more straightforward than a semi-elasticity and I could compare my results with previous work by other authors.

Understanding that I can't do the transformation, I'm still concerned with my coefficient for gdp_2 because this coefficient should be positive and close to 1 when taking the log of GDPs as the regressor. However, I get a negative value which is also really close to zero and I can't find any explanation in related literature. I read in another forum Professor Joao posted for regressors not in logs, the semi-elasticity is given by 100*(exp(beta) - 1)%. However this would suggest the semi-elasticity of trade to GDP is approximately 0%, which is really weird. How come exports are not in anyway dependent on GDP? Also, when I change the sample from Latin American exports to European exports (using the product of gdps as the regressor) the coefficient estimated is 1.35e-26, also really close to zero but this time it's positive. Does this mean the semi-elasticity for both regions is basically the same, in spite that one is positive and the other is negative? Doesn't the symbol mean anything in this context?

This is the regression result for European exporters.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30164
#6

13 Aug 2023, 14:18

You have to stop thinking about closeness to zero in this context--it is meaningless. The gdp_2 variable is scaled, and by changing the scale, as long as it is not exactly zero, you can make that coefficient as large or small as you care to by changing the scale. Closeness to zero is only meaningful for coefficients of variables that have no dimensions, no scale--but GDP, or GDP*GDP does not meet that criterion because it is scaled in the square of some currency unit. So just forget about that.

I read in another forum Professor Joao posted for regressors not in logs, the semi-elasticity is given by 100*(exp(beta) - 1)%

I do not know what source you are quoting this from, but you are misinterpreting that formula. That formula is the percentage outcome difference associated with a 1 unit difference in the variable that beta is the coefficient of. What is a 1 unit change in the product of two countries' GDPs? If one of the countries is, say, Germany, with a GDP on the order of magnitude of 4 trillion USD, a difference of 1 in the product of that with some other country's GDP in that year would correspond roughly to a difference in the other country's GDP of approximately 1 ten-billionth of a cent. So, yes, I would expect the relative change in any outcome associated with a 1 ten-billionth of a cent change in some country's GDP to be, for all practical purposes, zero. Again, your problem is arising because you are working with a scaled variable and you are failing to take that scale into account in reading the results.

Does this mean the semi-elasticity for both regions is basically the same, in spite that one is positive and the other is negative?

Based on what I said in the last paragraph, yes, the semi-elasticity of any change that is an incomprehensibly small fraction of a cent is going to be effectively zero for all regions.

Doesn't the symbol mean anything in this context?

I don't know what symbol you are referring to.

You have to focus on the scale of your gdp_2 variable. I think if you rescale it to units of trillions of dollars (Euros/yen/swiss franc, whatever it is) you will get results that are easier to understand, or at least can be calculated with numbers that are easy to work with.
Comment
Samuel Malkun

Join Date: Nov 2022

Posts: 12
#7

13 Aug 2023, 14:58

Thank you so much for clarifying that point for me Mr. Schechter. I now understand that closeness to zero is not relevant in this context due to the values in my GDP variable. It is a bit concerning that the coefficient takes a negative value for South American countries though. One would expect a positive value as seen for European exporters and all around literature, but I guess that's a whole other story. I would appreciate if Professor Joao could give me his input on why I'm getting a negative GDP coefficient for South American exporters.
Comment

Announcement

Help with collinearity in STATA

Comment

Comment

Comment

Comment

Comment

Comment