square of partly negative independent variable

Anton Schnepf

Join Date: Jul 2024

Posts: 3
#1

square of partly negative independent variable

31 Jul 2024, 05:00

Hello everyone,

i have a rather simple question on how to interpret the squared version of an independent variable.
Im running a panel regression using fixed effects and trying to measure the effect of a climate beta (cbeta) on market risk (cbeta) over the course of 12 years.
Additionally to cbeta I generate a squared version of cbeta called cbeta2.
Now my question is how to correctly interprete cbeta2, because cbeta holds both negative and postive values thus making cbeta2 only have postive values.

Thank you in advance!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

31 Jul 2024, 05:37

This seems backwards to me. Why square a predictor (you say independent variable) if you don't know how you can interpret the results?

Sometimes squaring has intrinsic meaning. In elementary physics there are many examples. In these cases, there always seems to be a zero defined by the problem (correct me if I am wrong), such as time since a ball was dropped, or whatever.

Sometimes squaring a predictor has a quite different rationale. In microeconomics and perhaps other social sciences people often seem to find age and age squared to be useful predictors for capturing curvature in some relationship (which need not imply that a turning point occurs within the range of the data). In that kind of situation, it's the joint effects of both predictors together that are crucial.

A scatter plot smooth of your outcome versus climate risk might be useful here. On the other hand, having several other predictors that make it hard to see patterns would not be surprising.
Comment
Anton Schnepf

Join Date: Jul 2024

Posts: 3
#3

31 Jul 2024, 05:59

Thank you for you answer!

Its my first time really working with stata. I squared cbeta, because I remembered doing so with age (as you mentioned) and other independent variables in a course a few years back.
In my regression the squared cbeta has a way lower p-value than the normal cbeta - thats why I want to include it.
I only have a problem interpreting it right due to my poblem with original postive and negative values....
You dont think thats a good idea?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

31 Jul 2024, 06:13

Your question isn't really about using Stata. The same issue arises with any software that supports regression. If you've been using statistics for some years that's good.

The key point is that it's the joint effect of the two -- the original and the square -- that's important. It makes no sense to look at the P-value of either as if the other were just another predictor. There isn't, even notionally, any sense in which you can think holding one constant while the other is free to vary.

Stata gives you separate P-values -- but that doesn't imply that they have meaning.

I am still hopeful that you will show the graph I asked for.
1 like
Comment
Anton Schnepf

Join Date: Jul 2024

Posts: 3
#5

31 Jul 2024, 06:23

I m not really sure if those are the ones that you were asking for?!

Attached Files
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

31 Jul 2024, 06:29

Both graphs (especially the second) are helpful, but would be clarified by ms(Oh) mcolor(blue%20) scheme(s1color).

If you're using some version of Stata that isn't the latest (18 as I write) please note our longstanding request to tell us what version you're using.

This might be a problem in which (unusually in my experience) the square alone is a candidate. But note the leverage of moderate outliers.
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 323
#7

31 Jul 2024, 07:55

It is easier to understand if you subract the mean of x from the value used in the regression, call the result xbar, Then the regession includes xbar and xbar squared. This won't change your estimated coefficients, predicted values of y or error terms, just the estimated value of the constant term , which will exactly compensate for the change from x to xbar. Now you can see that the coefficient on xbar is the effect of changes in x on the predicted value of y at the mean of x. If the coefficient of xbar squared is positive, that means that the effect of x increases as x exceeds xbar, If negative, then the effect of x becomes smaller or more negative as x exceeds xbar.

Adding a squared term is usually not the best way to model a non-linear effect. Have you tried the log or cube root?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

31 Jul 2024, 08:37

Which variable is to be logged here, Daniel?
1 like
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#9

31 Jul 2024, 08:57

y = a*x + b*x^2

dy/dx = a + 2b*x

As Nick says, the two coefficients do not stand alone, so you can't just look at one or the other's p-value (though, if the t-stat on b is small, then you might exclude it, as it is a test of non-linearity).

Do not concern yourself with negative values becoming positive when squared. The sign is restored in the derivative (b is multiplied by x).
Comment
Daniel Feenberg

Join Date: Oct 2014

Posts: 323
#10

01 Aug 2024, 05:09

I was thinking that the log of x might be a better way to allow for a non-linearity. I realize it is nothing like the square, but unless the OP has a reason to prefer squaring (and isn't just looking for a general way to allow for some non-linearity) the log or cube root may be fine, and don't impose an extreme effect on extreme values.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#11

01 Aug 2024, 06:05

Daniel Feenberg Thanks for spelling out your thinking. Unfortunately the x variable is often negative, which rules out logarithms beyond some device such as sign(x) log1p(abs(x)) .
Comment

Announcement

square of partly negative independent variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment