Handling Highly Skewed Independent Variable: Oil Rents (% of GDP) in Cross-Country Analysis

Said Karaca

Join Date: Feb 2026

Posts: 1
#1

Handling Highly Skewed Independent Variable: Oil Rents (% of GDP) in Cross-Country Analysis

16 Feb 2026, 10:14

Hello Stata community,

I am working on a project analyzing the effect of oil rents (% of GDP) on trade. Oil rents are my independent variable, and trade is measured using a scale ranging from 0 to 100.

I am facing a problem with the distribution of oil rents. Since I am conducting a cross-sectional analysis, I have 121 observations in total. However, when I include control variables and restrict the analysis to variables without missing values, the number of observations drops to 98.

The distribution of oil rents is highly left-skewed, with many countries in the 0–10% range. Because this is a global analysis, I do not want to exclude these countries, as doing so would reduce my sample size significantly. At the same time, I am concerned whether it is academically correct to continue the analysis with such skewed data.

I tried a log transformation, but it caused two issues:
The significance of oil rents in my analysis disappeared.

For countries with oil rents below 1%, the log transformation produces negative values.

I would greatly appreciate your advice on how to deal with this issue and possible Stata solutions to handle the skewed distribution while keeping all observations in the analysis.

Thank you in advance!
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 838
#2

16 Feb 2026, 10:43

Welcome to the forum! This a very common question you are asking. I suggest you use the search function. Just a few threads that can get you started. Then you can consider the solutions suggested there and come back if you are still unsure.

https://www.statalist.org/forums/for...fits-solutions
https://www.statalist.org/forums/for...sform-with-log
https://www.statalist.org/forums/for...rm-a-regressor

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#3

16 Feb 2026, 11:03

My guess is that your independent variable (I usually say predictor or explanatory variable) is right-skewed, not left-skewed. Whether you need to transform it depends mostly on whether that benefits modelling of the relationship between independent and dependent variable (I usually say outcome or response).

The fact that one or both variables is skew is in itself not crucial.

It's no kind of problem that any logarithms computed are negative, which means only that the original value is less than 1. After all, if you choose to work with proportions rather than percents, then all logarithms would be negative, unless exceptionally any values that were exactly 1 or 100%.

I would be more concerned if there were any zeros in the predictor.

For your project (which sounds rather like a class exercise) a good first step would be to plot your data, which might well indicate which kinds of relationship might make sense.
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#4

17 Feb 2026, 00:51

Said:
welcome to this forum.
Restricting the analysis to variables without missing values would cause you major problems than oil rents skewness!

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#5

17 Feb 2026, 02:33

You don't say anything about functional form but a response bounded by 0 and 100% would incline to consider first a logit model.
1 like
Comment

Announcement

Handling Highly Skewed Independent Variable: Oil Rents (% of GDP) in Cross-Country Analysis

Comment

Comment

Comment

Comment