Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Squaring Variables with Negative Values

    Hi All,

    Stata 14.2 - Longitudinal Data

    I'm wanting find the effects of a variable with a nonlinear relationship to the dependent variable and that also has both positive and negative values. Unfortunately using:

    Code:
    reg depvar var var#var
    will square negative values and create positive values. Yes, it would be possible to create a new variable "var^2" by multiplying "var" by its absolute value, but I want to use -margins- after the regression, which wouldn't be able to combine the effects of "var" and "var^2".

    Are there any ideas on how to make sure a squared variable stays negative and is able to be used in -margins-?

    -Jeff


  • #2
    Wrong way round. Using the square of a variable as a predictor doesn't remove scope for its coefficient to be returned as negative if the data so imply.

    I am puzzled by your syntax but this may be instructive

    Code:
    clear
    set obs 100
    gen x = _n
    gen y = 10000 - x^2
    regress y c.x#c.x
    
    gen x2 = _n - 50
    gen y2 = 10000 - x2^2
    regress y2 c.x2#c.x2
    If this doesn't answer your question please give a data and CODE example making your point.
    Last edited by Nick Cox; 23 Apr 2018, 08:48.

    Comment


    • #3
      Originally posted by Jeff Thompson View Post
      Unfortunately using:

      Code:
      reg depvar var var#var
      will square negative values and create positive values.
      [...]
      Are there any ideas on how to make sure a squared variable stays negative and is able to be used in -margins-?
      There is nothing unfortunate about the square of negative values being positive and it is nothing to worry about at all. Running the following

      Code:
      // toy data
      clear
      set obs 1000
      set seed 42
      
      // create truth
      generate x = rnormal()
      generate y = x^2 + rnormal()
      
      // get estimates
      regress y c.x##c.x
      
      // now make all values positive
      generate xpos = x + 4
      
      // re-estimate
      regress y c.xpos##c.xpos
      
      // graph the relationships
      scatter y x || scatter y xpos
      gives

      Code:
      . regress y c.x##c.x
      
            Source |       SS           df       MS      Number of obs   =     1,000
      -------------+----------------------------------   F(2, 997)       =    877.58
             Model |  1748.53184         2  874.265922   Prob > F        =    0.0000
          Residual |  993.231157       997  .996219817   R-squared       =    0.6377
      -------------+----------------------------------   Adj R-squared   =    0.6370
             Total |    2741.763       999  2.74450751   Root MSE        =    .99811
      
      ------------------------------------------------------------------------------
                 y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                 x |   .0409841   .0323319     1.27   0.205    -.0224622    .1044304
                   |
           c.x#c.x |    1.02479   .0245153    41.80   0.000     .9766829    1.072898
                   |
             _cons |  -.0488826   .0393773    -1.24   0.215    -.1261546    .0283894
      ------------------------------------------------------------------------------
      
      . regress y c.xpos##c.xpos
      
            Source |       SS           df       MS      Number of obs   =     1,000
      -------------+----------------------------------   F(2, 997)       =    877.58
             Model |  1748.53181         2  874.265904   Prob > F        =    0.0000
          Residual |  993.231193       997  .996219852   R-squared       =    0.6377
      -------------+----------------------------------   Adj R-squared   =    0.6370
             Total |    2741.763       999  2.74450751   Root MSE        =    .99811
      
      -------------------------------------------------------------------------------
                  y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      --------------+----------------------------------------------------------------
               xpos |  -8.157339   .1956634   -41.69   0.000    -8.541298   -7.773379
                    |
      c.xpos#c.xpos |    1.02479   .0245153    41.80   0.000     .9766829    1.072898
                    |
              _cons |   16.18383   .3798933    42.60   0.000     15.43834    16.92931
      -------------------------------------------------------------------------------
      Notice that the respective point estimates for the squared term are identical (as are the sums of squares, R-squared, etc.), meaning the model captures the exact same relationship between variables. This can also be seen in the (omitted) graph.

      Multiplying by the absolute value to get "pseudo-squared" terms, will mess thing up, completely.

      Best
      Daniel

      Comment


      • #4
        Thanks Nick and daniel for the helpful examples.

        The issue is that I should have posted the c. prefix before var.

        What are the differences between?
        Code:
         reg depvar var c.var#c.var

        and
        Code:
         reg depvar c.var##c.var


        I want to account for the interaction effects between a squared var and a categorical variable size. Which would be more appropriate?

        Code:
        reg depvar var c.var#c.var##i.size
        or
        Code:
        reg depvar c.var##c.var##i.size
        I ran both of these and the results are similar but not the same.


        For anyone who is interested:
        https://www3.nd.edu/~rwilliam/stats2/l51.pdf
        answers a lot but not (directly) this.

        -Jeff



        Comment


        • #5
          Ok, according to:
          https://www.stata.com/manuals13/u25.pdf (from 25.2.9 onward)
          The reason I was getting different estimates is because:

          Code:
          reg depvar var c.var#c.var##i.size
          is the same as:
          Code:
          reg depvar var var#var var#var#i.size
          while
          Code:
          reg depvar c.var##c.var##i.size
          is the same as:
          Code:
          reg depvar var var#var var#i.size var#var#i.size
          The difference being that the first one does not include the var#i.size interaction.

          Comment

          Working...
          X