Squaring Variables with Negative Values

Jeff Thompson

Join Date: Feb 2018

Posts: 30
#1

Squaring Variables with Negative Values

23 Apr 2018, 08:31

Hi All,

Stata 14.2 - Longitudinal Data

I'm wanting find the effects of a variable with a nonlinear relationship to the dependent variable and that also has both positive and negative values. Unfortunately using:

Code:

reg depvar var var#var

will square negative values and create positive values. Yes, it would be possible to create a new variable "var^2" by multiplying "var" by its absolute value, but I want to use -margins- after the regression, which wouldn't be able to combine the effects of "var" and "var^2".

Are there any ideas on how to make sure a squared variable stays negative and is able to be used in -margins-?

-Jeff
Tags: margins, negative value, nonlinear, squared
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

23 Apr 2018, 08:41

Wrong way round. Using the square of a variable as a predictor doesn't remove scope for its coefficient to be returned as negative if the data so imply.

I am puzzled by your syntax but this may be instructive

Code:

clear set obs 100 gen x = _n gen y = 10000 - x^2 regress y c.x#c.x gen x2 = _n - 50 gen y2 = 10000 - x2^2 regress y2 c.x2#c.x2

If this doesn't answer your question please give a data and CODE example making your point.

Last edited by Nick Cox; 23 Apr 2018, 08:48.
Comment

daniel klein

Join Date: Mar 2014
Posts: 3850

23 Apr 2018, 08:45

Originally posted by Jeff Thompson View Post

Unfortunately using:

Code:

reg depvar var var#var

will square negative values and create positive values.
[...]
Are there any ideas on how to make sure a squared variable stays negative and is able to be used in -margins-?

There is nothing unfortunate about the square of negative values being positive and it is nothing to worry about at all. Running the following

Code:

// toy data
clear
set obs 1000
set seed 42

// create truth
generate x = rnormal()
generate y = x^2 + rnormal()

// get estimates
regress y c.x##c.x

// now make all values positive
generate xpos = x + 4

// re-estimate
regress y c.xpos##c.xpos

// graph the relationships
scatter y x || scatter y xpos

gives

Code:

. regress y c.x##c.x

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(2, 997)       =    877.58
       Model |  1748.53184         2  874.265922   Prob > F        =    0.0000
    Residual |  993.231157       997  .996219817   R-squared       =    0.6377
-------------+----------------------------------   Adj R-squared   =    0.6370
       Total |    2741.763       999  2.74450751   Root MSE        =    .99811

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .0409841   .0323319     1.27   0.205    -.0224622    .1044304
             |
     c.x#c.x |    1.02479   .0245153    41.80   0.000     .9766829    1.072898
             |
       _cons |  -.0488826   .0393773    -1.24   0.215    -.1261546    .0283894
------------------------------------------------------------------------------

. regress y c.xpos##c.xpos

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(2, 997)       =    877.58
       Model |  1748.53181         2  874.265904   Prob > F        =    0.0000
    Residual |  993.231193       997  .996219852   R-squared       =    0.6377
-------------+----------------------------------   Adj R-squared   =    0.6370
       Total |    2741.763       999  2.74450751   Root MSE        =    .99811

-------------------------------------------------------------------------------
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         xpos |  -8.157339   .1956634   -41.69   0.000    -8.541298   -7.773379
              |
c.xpos#c.xpos |    1.02479   .0245153    41.80   0.000     .9766829    1.072898
              |
        _cons |   16.18383   .3798933    42.60   0.000     15.43834    16.92931
-------------------------------------------------------------------------------

Notice that the respective point estimates for the squared term are identical (as are the sums of squares, R-squared, etc.), meaning the model captures the exact same relationship between variables. This can also be seen in the (omitted) graph.

Multiplying by the absolute value to get "pseudo-squared" terms, will mess thing up, completely.

Best
Daniel

Comment

Jeff Thompson

Join Date: Feb 2018

Posts: 30
#4

23 Apr 2018, 10:08

Thanks Nick and daniel for the helpful examples.

The issue is that I should have posted the c. prefix before var.

What are the differences between?

Code:

reg depvar var c.var#c.var

and

Code:

reg depvar c.var##c.var

I want to account for the interaction effects between a squared var and a categorical variable size. Which would be more appropriate?

Code:

reg depvar var c.var#c.var##i.size

or

Code:

reg depvar c.var##c.var##i.size

I ran both of these and the results are similar but not the same.

For anyone who is interested:
https://www3.nd.edu/~rwilliam/stats2/l51.pdf
answers a lot but not (directly) this.

-Jeff
Comment
Jeff Thompson

Join Date: Feb 2018

Posts: 30
#5

24 Apr 2018, 04:08

Ok, according to:
https://www.stata.com/manuals13/u25.pdf (from 25.2.9 onward)
The reason I was getting different estimates is because:

Code:

reg depvar var c.var#c.var##i.size

is the same as:

Code:

reg depvar var var#var var#var#i.size

while

Code:

reg depvar c.var##c.var##i.size

is the same as:

Code:

reg depvar var var#var var#i.size var#var#i.size

The difference being that the first one does not include the var#i.size interaction.
Comment

Announcement