Hello Stata community,
I am working on a project analyzing the effect of oil rents (% of GDP) on trade. Oil rents are my independent variable, and trade is measured using a scale ranging from 0 to 100.
I am facing a problem with the distribution of oil rents. Since I am conducting a cross-sectional analysis, I have 121 observations in total. However, when I include control variables and restrict the analysis to variables without missing values, the number of observations drops to 98.
The distribution of oil rents is highly left-skewed, with many countries in the 0–10% range. Because this is a global analysis, I do not want to exclude these countries, as doing so would reduce my sample size significantly. At the same time, I am concerned whether it is academically correct to continue the analysis with such skewed data.
I tried a log transformation, but it caused two issues:
Thank you in advance!
I am working on a project analyzing the effect of oil rents (% of GDP) on trade. Oil rents are my independent variable, and trade is measured using a scale ranging from 0 to 100.
I am facing a problem with the distribution of oil rents. Since I am conducting a cross-sectional analysis, I have 121 observations in total. However, when I include control variables and restrict the analysis to variables without missing values, the number of observations drops to 98.
The distribution of oil rents is highly left-skewed, with many countries in the 0–10% range. Because this is a global analysis, I do not want to exclude these countries, as doing so would reduce my sample size significantly. At the same time, I am concerned whether it is academically correct to continue the analysis with such skewed data.
I tried a log transformation, but it caused two issues:
- The significance of oil rents in my analysis disappeared.
- For countries with oil rents below 1%, the log transformation produces negative values.
Thank you in advance!

Comment