Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First difference of a squared term

    Dear all,

    I am working with a panel dataset and am trying to run a non-linear (quadratic) regression using the first difference operator (D.). Specifically, my regressors should be the first difference of the linear and quadratic independent variable. My issue is independent of the specific data used, therefore I just name Y my dependent and X my independent variables. The code would be:

    gen X_sqr = X^2

    reg d.Y d.X d.X_sqr

    However, I need to use the command margins and marginsplot to plot how the marginal effect of X on Y varies at different levels of X. Therefore, I though about the following code:

    reg d.Y d.(c.X##c.X)

    Unfortunately, the output is not the first difference of the square but it is the square of the first difference. The same as if I typed the following:

    reg d.Y d.c.X##d.c.X

    I have tried to find a solution but the only relevant page I have found is the following:

    https://www.statalist.org/forums/for...ferent-results

    Where Clyde Schechter pointed out that this relationship is "not expressible with factor-variable notation". However, since this post is from 2014, I am asking again in case something has changed in the last 8 years. I know the marginal effect can be computed manually, but it is worth checking whether margins could be applicable in this context.

    I have read the FAQ and tried to be as precise as possible. Apologies for any mistakes.

    Thank you in advance to whoever will read this post.
    Last edited by Romano Tarsia; 28 Jul 2022, 13:48.

  • #2
    Well, it isn't directly expressible in factor variable notation. And I have not revisited that older post to see if what you are asking is materially different from what is there. But as I look at this today, we can reason as follows:

    Code:
    D.(X#X) = Xn2 - Xn-12 = (Xn + Xn-1) * (Xn - Xn-1) = X#D.X + L1.X#D.X
    So, if your regression command accepts constraints, you can include c.X#D.c.X and L1.c.X#D.c.X in the model and constrain their coefficients to be equal, and that will cover the first difference of X2. The problem is that the panel data regression commands do not, as far as I know, accept constraints.

    Otherwise, I don't think you can do this with factor variables and margins.

    Comment


    • #3
      the following is from the manual: "The D. and S. time-series operators may not be combined with factor variables because such
      combinations could have two meanings. iD.a could be the level indicators for the difference of the
      variable a from its prior period, or it could be the level indicators differenced between the two periods.
      These are generally not the same values, nor even the same number of indicators."

      Comment


      • #4
        Dear both, thanks a lot for your availability and time, I appreciate it. I understand why using margins and marginsplot is not possible in this setting and therefore it is necessary to create the plot manually.
        On this regard, I know how to plot the prediction (or the marginal effect taking the derivative wrt x):

        twoway function y = (_b[d.X]*x)+(_b[d.X_sqr]*x^2), range(0 20)

        And I also found how to plot the Confidence Interval when the model is linear:

        twoway (function y = _b[d.year_avg_t] - invttail(e(df_r),0.025)*_se[d.year_avg_t], range(0 20)) || (function y = _b[d.year_avg_t] + invttail(e(df_r),0.025)*_se[d.year_avg_t], range(0 20))


        Source:
        https://journals.sagepub.com/doi/pdf...867X0800700408

        However, I could not find anything online on the syntax to create CI for a quadratic model. Would you be able to help me?
        Last edited by Romano Tarsia; 29 Jul 2022, 07:55.

        Comment


        • #5
          I'm afraid I don't know how to do that. Hopefully, somebody else following the thread does, and will respond.

          Comment


          • #6
            Dear Clyde Schechter, in any case thanks a lot for your availability. I appreciate it!

            Comment


            • #7
              Am I correct in saying these are coding issues and not inherent statistical problems? Or is there some reason that you can’t or shouldn’t do this?

              whenever Stata won’t easily do something I think it should be able to do, I’m never sure if it is because (a) it is a statistically bad idea or (b) nobody’s gotten around to programming it yet.

              For example, if it is just a problem of notation being ambiguous (as in post #3) I would think you could invent some unambiguous notation if it was worth doing.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                It's a coding issue. The quote from the manual in #3 is not actually correct. If you use D1.X#D1.X, Stata accepts that, and interprets it as the square of the difference (not what OP wants here). But there is no notation existing that produces the difference of the squares.

                I can't think of a reason that it would be statistically impermissible to use the difference of the squares, although I have to admit I have never seen a situation where that was useful and have difficulty imaging one where it is. So I think this has only a limited range of use cases, and the current situation is only infrequently a problem.

                As for inventing an unambiguous notation, that could be hard. The "obvious" candidate would be D1.(X#X). But Stata has already established a convention that the time series operators (and other operators like c. and i.) distribute over #. So that leads back to the square of the difference. Repealing that convention for D (and S) would be a bad idea because it would 1) break a lot of existing code, 2) probably not be noticed by everybody leading to some people inadvertently misspecifying their models, and generally create confusion. So the notation would have to be something qualitatively different from the usual "operator algebra" style.
                Last edited by Clyde Schechter; 29 Jul 2022, 15:39.

                Comment


                • #9
                  Dear Richard and Clyde, thanks a lot for your messages. It is indeed a coding issue and I agree that it is only relevant for a limited number of cases. Specifically, I am estimating the impact of temperatures (with both the linear and squared terms) on GDP, where GDP is non-stationary, hence I need to use first differences on both my Xs and Y (Newell et al. 2021). In addition, I also understand that the issue arises only for plotting the coefficients, because the estimation can be easily carried out by generating the squared variable and apply D1.

                  Therefore, if someone on statalist knows how to define the confidence intervals of a quadratic variable it would be very useful, otherwise I can just show my estimates in a table.

                  In any case, thanks a lot to all of you for replying and your time.

                  Comment


                  • #10
                    I just did a search (duckduckgo, not google) on "confidence interval for quadratic term in regression" and found a number of entries so you might want to try that
                    Last edited by Rich Goldstein; 30 Jul 2022, 05:59.

                    Comment


                    • #11
                      I may be missing something in the preceding discussion, but if you derive the expression of the derivative, why not use margins to calculate the confidence intervals? From the following example, it is apparent that margins does the computation correctly given that you input the correct expression.


                      Model:

                      \(\text{inv }= \beta_{0} + \beta_{1}\text{mv }+ \beta_{2}\text{ks }+ \beta_{3}\text{mv}\times\text{ks }+ \beta_{4}\text{L.x1 }+ \beta_{5}\text{D.x2 }+ \beta_{6}\text{D.x2}\times\text{c.L.x1 }+ \beta_{7}\text{mv}\times\text{D.x2 } + \beta_{8}\text{mv}\times\text{L.x1}\times\text{D.x 2} + u\)


                      Expression:

                      $$\frac{\partial{\text{invest}}}{\partial{\text{mv }}}= \beta_{1}+ \beta_{3}\text{ks }+\beta_{7}\text{D.x2 }+ \beta_{8}\text{L.x1}\times\text{D.x2}$$

                      Code:
                      webuse grunfeld, clear
                      keep if company==1
                      tsset time
                      set seed 07302022
                      forval i=1/2{
                         gen x`i'= rnormal()
                      }
                      
                      *MARGINS
                      regress invest c.mvalue##c.kstock L.x1 D.x2 c.D.x2#c.L.x1 c.mvalue#c.D.x2 c.mvalue#c.L.x1#c.D.x2
                      margins, dydx(mvalue)
                       
                      *BY HAND USING -EXPRESSION()-
                      g mvks = c.mvalue#c.kstock
                      g mvDx2= c.mvalue#c.D.x2
                      g Lx1= L.x1
                      g Dx2= D.x2
                      g mvLx1Dx2= c.mvalue#c.L.x1#c.D.x2
                      regress invest c.mvalue c.kstock c.mvks Lx1 Dx2 c.Dx2#c.Lx1 c.mvDx2 c.mvLx1Dx2
                      local df= e(df_r)
                      margins, expression(_b[c.mvalue]+_b[c.mvks]*ks +_b[c.mvDx2]*Dx2 + _b[c.mvLx1Dx2]*Lx1*Dx2) df(`df')
                      Res.:

                      Code:
                      *MARGINS
                      
                      . margins, dydx(mvalue)
                      
                      Average marginal effects                        Number of obs     =         19
                      Model VCE    : OLS
                      
                      Expression   : Linear prediction, predict()
                      dy/dx w.r.t. : mvalue
                      
                      ------------------------------------------------------------------------------
                                   |            Delta-method
                                   |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                            mvalue |   .1062574   .0271807     3.91   0.003      .045695    .1668197
                      ------------------------------------------------------------------------------
                      
                      *BY HAND USING -EXPRESSION()-
                      . local df= e(df_r)
                      
                      .
                      . margins, expression(_b[c.mvalue]+_b[c.mvks]*ks +_b[c.mvDx2]*Dx2 + _b[c.mvLx1Dx2]*Lx1*Dx2) df(`df')
                      Warning: expression() does not contain predict() or xb().
                      
                      Predictive margins                              Number of obs     =         19
                      Model VCE    : OLS
                      
                      Expression   : _b[c.mvalue]+_b[c.mvks]*ks +_b[c.mvDx2]*Dx2 + _b[c.mvLx1Dx2]*Lx1*Dx2
                      
                      ------------------------------------------------------------------------------
                                   |            Delta-method
                                   |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                             _cons |   .1062574   .0271807     3.91   0.003      .045695    .1668198
                      ------------------------------------------------------------------------------
                      Last edited by Andrew Musau; 30 Jul 2022, 11:27.

                      Comment


                      • #12
                        Dear Andrew Musau, yes that's a great solution, I was not aware of the expression option. Thanks a lot!

                        Just have one issue: I have tried to run both specification without first differences and I obtain the same results. However, when I move to first differences I have the following issue:


                        If I code the following (using the D. operator) margins yield a constant effect between different levels:
                        Code:
                        reg d.Y d.X d.X_sqr
                        
                        local df= e(df_r)
                        
                        margins, expression(_b[d.X] + 2*_b[d.X_sqr]*d.X) df(`df') at(d.X = (-5(5)20)) level(95)
                        Whereas if I first generate the variables in first differences and then run the regression margins yield an effect changing according to X

                        Code:
                        gen d_Y = d.Y
                        gen d_X = d.X
                        gen d_X_sqr = d.X_sqr
                        
                        reg d_Y d_X d_X_sqr
                        
                        margins, expression(_b[d_X] + 2*_b[d_X_sqr]*d_X) df(`df') at(d_X = (-5(5)20)) level(95)
                        Do you have any idea about why this is the case? Obviously I can use the second specification but it would interesting to understand.
                        Thank you again!

                        Comment


                        • #13
                          Your expression

                          $$\frac{d\text{D.(x}^{2})}{d\text{D.x}}$$

                          is incorrect. When computing the derivative, look at Clyde's expression in #2 that expresses \(D.(x^2)\) in terms of \(D.x\).

                          Comment


                          • #14
                            Dear Andrew Musau thank you, it is clear!

                            Comment

                            Working...
                            X