Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Independent variable becomes significant when i include its squared value in regression

    Hi guys,

    I am looking into the effect of inequality on economic growth using panel estimation.

    When I include gini^2 in my regression as well as gini, the coefficient on gini becomes significant but without including the squared term it is extremely insignificant.

    Regression: gdpgrowth = gini + gini^2 + educ variable + dummy variables... etc

    The gini coefficient is positive but the gini^2 is negative. I understand that this means the relationship can be represented by an inverted U-shape (i.e inequality has positive effect with low levels of gdpgrowth but negative at higher levels)

    BUT i don't understand why including gini^2 drastically changes the significance of the results???

    Any help would be greatly appreciated!

  • #2
    First of all, statistical significance is a very brittle construct and the tiniest things can produce apparently wild swings in statistical significance. That's one of the many reasons the American Statistical Association now recommends discontinuing the use of statistical significance. https://www.tandfonline.com/doi/full...5.2019.1583913 (For a shorter pep talk, see https://www.nature.com/articles/d41586-019-00857-9.)

    That said, it is also important to understand that in a model with linear and quadratic term, even if you take statistical significance as a concept seriously, the significance of either the linear or quadratic term by itself is meaningless. Only their joint significance would matter.

    And whether we believe in statistical significance or not, the coefficient of the linear term in a quadratic model is not a measure of the association between that predictor variable and the outcome variable. Instead, it serves primarily to locate the turning point in the (inverted-) U-shaped relationship.

    Comment


    • #3
      Most of us have little feel for what's happening really even in a three-dimensional space (response, two (continuous?) predictors, and dummies (indicators) moving us around in their space) unless we can see the data. To that end, why ask for interpretation when we can't see the data?

      Useful plots are

      plots of the individual distributions

      scatter plot matrix

      added variable plots after regression.

      What is going on here could be anything from mostly noise (but a large enough dataset that some weak signals pass conventional significance levels) through something warped by pathologies such as outliers to something systematic and interesting.

      I wouldn't trust a regression of GDP growth for a start without seeing the data. Non-economists too know that means a bundle of economies growing rather slowly, a few growing rapidly and a few collapsing utterly. As this is panel data, how long is the record and when does it stop? Is 2008 following in there? 1929 following? etc.

      Comment

      Working...
      X