Significance level of quadratic term

Brenden Beck

Join Date: Apr 2015

Posts: 37
#1

Significance level of quadratic term

29 Aug 2017, 15:08

Hi all. I'm analyzing a squared/quadratic term and finding plenty of help in textbooks on how to interpret the coefficients, but nothing on how to interpret the significance levels. My variable of interest (percent of a city that is Hispanic) is non-significant in the baseline model. Once I add the squared version of percent Hispanic, the linear version becomes significant, but the squared version is not. Does this mean that I can't interpret the squared term, or can I still interpret it because the linear variable is significant? Thanks for any help!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#2

29 Aug 2017, 15:59

The significance level of a quadratic term on its own is as meaningless a statistic as you're ever likely to come across. The interpretation of quadratic models requires remembering some algebra and some facts about parabolas.

If you are interest in testing whether there is an effect of percent Hispanic, you would first do a joint test of the linear and quadratic coefficients. You don't show your code or results. But assuming you did something like

Code:

regress outcome c.hispanic##c.hispanic ...

The test is

Code:

test hispanic hispanic#hispanic

If that's not significant, and if you believe in selecting variables for model inclusion based on significance tests (I don't) then you would drop both hispanic and non-hispanic. If that test gives a significant result, then you keep both the linear and quadratic terms, even if one or both is separately non-significant.

Next you have to figure out what it means. The key fact is that a parabola whose equation is y = ax² + bx + c has its vertex at x = -b/2a, and that value of x is one that produces the minimum outcome if a > 0, and the maximum outcome if a > 0.

So you have to calculate the location of the vertex:

Code:

nlcom _b[hispanic]/(2*_b[hispanic#hispanic])

If this number lies comfortably within the range of values of hispanic in your data, then you have U-shaped (if the quadratic coefficient is positive) or inverted-U-shaped (if the quadratic coefficient is negative) relationship between hispanic and your outcome. If, however, the vertex is located outside the range of the observed values of hispanic, or near the edges, then what you have is a somewhat curvilinear relationship, but not one that actually makes a U-turn within the observed range. It can be helpful to visualize it graphically. Pick an interesting range of values of hispanic that run from the lowest to the highest in your data. Let's say, for the sake of a concrete example, that in your data, the observed range is from 5 to 80 percent. Then I would run

Code:

margins, at(hispanic = (5(5)80)) marginsplot
8 likes
Comment
Brenden Beck

Join Date: Apr 2015

Posts: 37
#3

30 Aug 2017, 07:50

Thanks so much, Clyde. This is enormously helpful. When I run the test after one of my models, Stata is dropping one constraint.

Code:

. test pcthisp2 pcthisp2 ( 1) pcthisp2 = 0 ( 2) pcthisp2 = 0 Constraint 2 dropped F( 1, 1141) = 21.13 Prob > F = 0.0000

The User Guide says it drops a constraint when it is "implied" by another. Does "implied" mean that it is perfectly collinear with the linear variable so Stata is not including it when it calculates the test? Thanks again.
Comment
Brenden Beck

Join Date: Apr 2015

Posts: 37
#4

30 Aug 2017, 07:51

Never mind, I see it now, I put the same variable in twice. My bad. Now it's working great. Thanks so much!
Comment
Doug Hemken

Join Date: Jul 2014

Posts: 219
#5

30 Aug 2017, 12:39

Kind of an interesting scenario.
First, Clyde's advice is great, and I feel like he's sharpened my thinking on this.

It was an interesting exercise to try and simulate data that matches this scenario. Ultimately I find I can generate this scenario with any of: a quadratic data-generating model, a linear data-generating model, or a null data-generating model! Regardless of the true model I do, the data and models that fit the scenario look much like this:

In the linear model, the coefficient is not significant.
In the quadratic model, the first-order coefficient is significant but the second-order coefficient is not. And the F test for the joint effect is significant.

Doug Hemken
SSCC, Univ. of Wisc.-Madison
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#6

30 Aug 2017, 13:36

Well, with 24 points and a lot of noise in the data, there is probably not enough information to distinguish a linear from a quadratic model, or a constant only model. In that case, I would probably choose among the models based on what prior theory suggests is most sensible; I doubt any kind of data-based analysis would be able to do better.

In situations with more data points or less noise, and theory provides no guidance, it may be possible to use the data to distinguish which model is a better fit. For example, one could fit both models and look at the AIC and BIC differences.

Since I'm in a philosophical mode at the moment, I'll go off tangentially here.

It's probably also worth pointing out that in practice, quadratic models are typically used as a proxy for "sort of U-shaped" or "somewhat curvilinear" relationships. In truth there are very few laws of nature that involve quadratic functions. Yes, the displacement of a particle under constant acceleration comes rapidly to mind, but really it is hard to come up with other examples. So, in practice when people are fitting quadratic models they're typically just saying that they have a curvilinear relationship, maybe even one that peaks and then declines (or the other directions), and does not thereafter change direction yet again in the observed range of data. They're punting on the real underlying data mechanism and hoping that the quadratic approximation will be good enough.

And bear in mind that any regression model you express in code can be fed into a computer, and the computer can estimate the parameter values that best fit that model to the data according to some fit criterion (least squares, max likelihood, whatever). But that doesn't mean that the model is any good. The sock whose size and style best fits your hand is still not a glove.

Even when you have a good fitting model, unless the sole purpose is to efficiently summarize the data for future regurgitation, you still have to consider whether the model itself actually makes sense in terms of plausible real-world data generating processes, and whether the model offers you any insight into what's going on. If you take the age-adjusted breast cancer mortality rates in the United States between 1975 and 2000, it turns out that a simple quadratic model provides a very good fit to that data--but as a model of the actual data generating process it's obviously grossly deficient, and, unsurprisingly perhaps, its predictions extrapolated to the first decade of the 21st century turn out to be quite inaccurate. Moreover it obviously sheds no light at all on the complex interplay of the growing use of mammographic screening and new breast cancer treatments between 1975 and 2000.
3 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35812
#7

31 Aug 2017, 01:59

Extending Clyde's comments a little:

Isaac Newton aside the rationale for quadratics is usually empirical. That may be no more than providing needed curvature even if the implied turning point is outside the range of the data. Income-age relationships appear often to be a case in point.

A downside sometimes is that their limiting behaviour is often implausible. A quadratic in x with a minimum will explode towards either extreme of x. What seems to bite more often is that a quadratic with a positive maximum must change sign somewhere, which may not match the physical (medical, economic, whatever) principles which apply, typically that the response should be positive (or non-negative) always Often people decide not to worry about such a problem if it doesn't bite them. After all, no-one worries about the range of a Gaussian being infinite whenever it looks a good enough match to finite data.

An alternative that seems rarely even tried is inverse polynomials such as 1/y = polynomial in x. For example,

1/y = a/x + b + cx

with positive coefficients has the interesting limits y tending to 0 as x does and also as x gets arbitrarily large. There is a turning point in between.

More at https://www.statalist.org/forums/for...gression-model including the key reference.

P.S. I've seen cubics fitting the data very nicely but going haywire just outside. At least with quadratics a graph gives a clear picture of what would happen outside. Obvious but crucial: always draw a graph to see what you did.

Last edited by Nick Cox; 31 Aug 2017, 02:20.
3 likes
Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3467

31 Aug 2017, 01:59

Originally posted by Clyde Schechter View Post

So you have to calculate the location of the vertex:

Code:

nlcom _b[hispanic]/(2*_b[hispanic#hispanic])

[/code]

It is a good idea to look at the vertex this way (except for a missing minus sign).

However, I would be careful with the standard error and confidence interval. Especially if the coefficient of the quadratic term is small, the sampling distribution of the vertex could easily involve numbers divided by 0 or close to 0. The delta method used by nlcom assumes that the sampling distribution is normal. Below is an example that illustrates that.

Code:

. clear all

. set seed 123456

.
. set obs 1000
number of observations (_N) was 0, now 1,000

. gen x = rnormal()

. gen y = -1 + 1.5*x  +  0.1*x*x + rnormal(0,3)

. reg y c.x##c.x

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(2, 997)       =    136.54
       Model |  2523.96856         2  1261.98428   Prob > F        =    0.0000
    Residual |  9215.04828       997  9.24277661   R-squared       =    0.2150
-------------+----------------------------------   Adj R-squared   =    0.2134
       Total |  11739.0168       999  11.7507676   Root MSE        =    3.0402

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.506609   .0940726    16.02   0.000     1.322005    1.691212
             |
     c.x#c.x |   .1579869   .0607387     2.60   0.009     .0387965    .2771772
             |
       _cons |  -.9567918    .115488    -8.28   0.000    -1.183419   -.7301643
------------------------------------------------------------------------------

. nlcom - _b[c.x] / (2*_b[c.x#c.x])

       _nl_1:  - _b[c.x] / (2*_b[c.x#c.x])

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |  -4.768145   1.883648    -2.53   0.011    -8.460027   -1.076263
------------------------------------------------------------------------------

.
. program define toboot, rclass
  1.         syntax [if]
  2.         marksample touse
  3.         reg y c.x##c.x if `touse'
  4.         return scalar vertex = -_b[c.x]/(2*_b[c.x#c.x])
  5. end

.         
. tempfile res    

. bootstrap vertex=r(vertex) , reps(2000) bca saving(`res') nodots : toboot       

Bootstrap results                               Number of obs     =      1,000
                                                Replications      =      2,000

      command:  toboot
       vertex:  r(vertex)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      vertex |  -4.768145   158.7635    -0.03   0.976     -315.939    306.4027
------------------------------------------------------------------------------

. estat bootstrap, perc bca

Bootstrap results                               Number of obs     =      1,000
                                                Replications      =       2000

      command:  toboot
       vertex:  r(vertex)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
      vertex |  -4.7681454   1.825333   158.76354   -17.84846   -2.57641   (P)
             |                                      -19.05277  -2.598835 (BCa)
------------------------------------------------------------------------------
(P)    percentile confidence interval
(BCa)  bias-corrected and accelerated confidence interval

. use `res', clear
(bootstrap: toboot)

. qnorm vertex

Click image for larger version

Name: Graph.png
Views: 1
Size: 22.0 KB
ID: 1408570

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Roman Mostazir

Join Date: Apr 2014

Posts: 876
#9

11 Oct 2018, 05:48

Quite an enjoyable thread. Just spotted a typo #2 Clyde Schechter and felt the need to correct it if anyone in future reading it through: "The key fact is that a parabola whose equation is y = ax² + bx + c has its vertex at x = -b/2a, and that value of x is one that produces the minimum outcome if a > 0, and the maximum outcome if a > 0." The maximum outcome is if a < 0.

Roman
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#10

11 Oct 2018, 08:37

Indeed, Roman Mostazir is right. Thank you for correcting my error.
1 like
Comment

Announcement