Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handling Highly Skewed Independent Variable: Oil Rents (% of GDP) in Cross-Country Analysis

    Hello Stata community,

    I am working on a project analyzing the effect of oil rents (% of GDP) on trade. Oil rents are my independent variable, and trade is measured using a scale ranging from 0 to 100.

    I am facing a problem with the distribution of oil rents. Since I am conducting a cross-sectional analysis, I have 121 observations in total. However, when I include control variables and restrict the analysis to variables without missing values, the number of observations drops to 98.

    The distribution of oil rents is highly left-skewed, with many countries in the 0–10% range. Because this is a global analysis, I do not want to exclude these countries, as doing so would reduce my sample size significantly. At the same time, I am concerned whether it is academically correct to continue the analysis with such skewed data.

    I tried a log transformation, but it caused two issues:
    1. The significance of oil rents in my analysis disappeared.
    2. For countries with oil rents below 1%, the log transformation produces negative values.
    I would greatly appreciate your advice on how to deal with this issue and possible Stata solutions to handle the skewed distribution while keeping all observations in the analysis.

    Thank you in advance!

  • #2
    Welcome to the forum! This a very common question you are asking. I suggest you use the search function. Just a few threads that can get you started. Then you can consider the solutions suggested there and come back if you are still unsure.

    https://www.statalist.org/forums/for...fits-solutions
    https://www.statalist.org/forums/for...sform-with-log
    https://www.statalist.org/forums/for...rm-a-regressor
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      My guess is that your independent variable (I usually say predictor or explanatory variable) is right-skewed, not left-skewed. Whether you need to transform it depends mostly on whether that benefits modelling of the relationship between independent and dependent variable (I usually say outcome or response).

      The fact that one or both variables is skew is in itself not crucial.

      It's no kind of problem that any logarithms computed are negative, which means only that the original value is less than 1. After all, if you choose to work with proportions rather than percents, then all logarithms would be negative, unless exceptionally any values that were exactly 1 or 100%.

      I would be more concerned if there were any zeros in the predictor.

      For your project (which sounds rather like a class exercise) a good first step would be to plot your data, which might well indicate which kinds of relationship might make sense.

      Comment


      • #4
        Said:
        welcome to this forum.
        Restricting the analysis to variables without missing values would cause you major problems than oil rents skewness!
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          You don't say anything about functional form but a response bounded by 0 and 100% would incline to consider first a logit model.

          Comment

          Working...
          X