Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intuition for including a quadratic variable in a fixed effects regression model

    Hi everyone,

    I was facing a question regarding the intention for including a quadratic variable in a fixed effects regression model. For instance, if there originally exists a quadratic relationship between two variables X and Y, we need to include a quadratic form of the variable to take that into account. However, in fixed effects regression model, the variables are demeaned, that is, the mean is subtracted from the original values and then the regression takes place. So, just in case of fixed effects, is there any need to enter the quadratic form because the quadratic relationship does not exist after the variable is demeaned. I would be thankful for a response!


  • #2
    Himani:
    as usual, you can plug both the linear and quadratic term of the regressor you're ijterested in and see what Stata gives you back, as in the following badly misspecified toy-example:
    Code:
    . use "https://www.stata-press.com/data/r16/nlswork.dta"
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . xtreg ln_wage c.age##c.age, fe
    
    Fixed-effects (within) regression               Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1087                                         min =          1
         between = 0.1006                                         avg =        6.1
         overall = 0.0865                                         max =         15
    
                                                    F(2,23798)        =    1451.88
    corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0539076   .0028078    19.20   0.000     .0484041    .0594112
                 |
     c.age#c.age |  -.0005973   .0000465   -12.84   0.000    -.0006885   -.0005061
                 |
           _cons |    .639913   .0408906    15.65   0.000     .5597649    .7200611
    -------------+----------------------------------------------------------------
         sigma_u |   .4039153
         sigma_e |  .30245467
             rho |  .64073314   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(4709, 23798) = 8.74                 Prob > F = 0.0000
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      You should not confuse the model you wish to estimate with the estimation method.
      If you think that your model should include a quadratic term, insert it into your model.
      Fixed Effects is an estimation method to "assign" values to the coefficients of your model.

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Himani:
        as usual, you can plug both the linear and quadratic term of the regressor you're ijterested in and see what Stata gives you back, as in the following badly misspecified toy-example:
        Code:
        . use "https://www.stata-press.com/data/r16/nlswork.dta"
        (National Longitudinal Survey. Young Women 14-26 years of age in 1968)
        
        . xtreg ln_wage c.age##c.age, fe
        
        Fixed-effects (within) regression Number of obs = 28,510
        Group variable: idcode Number of groups = 4,710
        
        R-sq: Obs per group:
        within = 0.1087 min = 1
        between = 0.1006 avg = 6.1
        overall = 0.0865 max = 15
        
        F(2,23798) = 1451.88
        corr(u_i, Xb) = 0.0440 Prob > F = 0.0000
        
        ------------------------------------------------------------------------------
        ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        age | .0539076 .0028078 19.20 0.000 .0484041 .0594112
        |
        c.age#c.age | -.0005973 .0000465 -12.84 0.000 -.0006885 -.0005061
        |
        _cons | .639913 .0408906 15.65 0.000 .5597649 .7200611
        -------------+----------------------------------------------------------------
        sigma_u | .4039153
        sigma_e | .30245467
        rho | .64073314 (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(4709, 23798) = 8.74 Prob > F = 0.0000
        Hi Carrlo,

        Thank you for your response. I tried doing what you suggested and I do get different signs on the linear and quadratic terms. The problem I am facing here is that the sign of the regressor is not obvious and is something I am trying to explore. Is there any way to solve this problem? In addition I am studying 40 cross section units over 20 years. So if the relationship between two variables is quadratic for one country, it is not necessary that it will be quadratic for another country. So how do we decide on the functional form then? Could including country specific time trends solve this problem? For instance, if I try to include linear/ quadratic country specific time trends, can it compensate for quadratic relationship in case I decide to use linear regressor? Or can taking log of the variable solve this? Thank you for taking out time to clear this!

        Comment


        • #5
          Originally posted by Eric de Souza View Post
          You should not confuse the model you wish to estimate with the estimation method.
          If you think that your model should include a quadratic term, insert it into your model.
          Fixed Effects is an estimation method to "assign" values to the coefficients of your model.
          Hi Eric,

          Thanks for your response. I tried comparing the R square (basically within, since we are doing fixed effects). The R square for the two models are the almost the same. Is there any other way to compare the fits of the two models? Thanks!

          Comment


          • #6
            Since I hardly ever compare the fits of two models, I wouldn't know. But in your case, since the estimation method you use is the within estimator, the corresponding R-squared would indeed be the within R-squared.

            Comment


            • #7
              Himani:
              the issue for a quadratic relationship between regressand and a given regressor is not the sign of the two terms, but their statistical significance (if the squared term does not reach statistical significance, there's no evidence of a quadratic relationship).
              That said, the usual form for calculating a turning point is -b/2a.
              In the previous example:
              Code:
              . di -( .0539076)/(2*(-.0005973))
              45.126067
              
              . sum age
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                       age |     28,510    29.04511    6.700584         14         46
              In this misspecified model the turning point is at -age-=45.13 (as this value falls in the range of -age- values it can be considered acceptable as a turning point).
              Unless you want to include a third term (-country-) in the interaction code, the presence/absence of significance is actually referred to all the panels included in your regression.
              I fail to get how quadratic country-specific time or logging could handle the issue.

              As an aside, I do share Eric's advice to consider within R-sq for -fe- models; even though -erturn list- will give you back an adjusted Rsq e(r2_a) after -xtreg,fe-, it is difficult to envisage what it would be useful for.
              Eventually, please note that if your T dimension>N dimension (or they are similar) you could consider long panels commands, such as -xtgls- and -xtregar-.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Originally posted by Carlo Lazzaro View Post
                Himani:
                the issue for a quadratic relationship between regressand and a given regressor is not the sign of the two terms, but their statistical significance (if the squared term does not reach statistical significance, there's no evidence of a quadratic relationship).
                That said, the usual form for calculating a turning point is -b/2a.
                In the previous example:
                Code:
                . di -( .0539076)/(2*(-.0005973))
                45.126067
                
                . sum age
                
                Variable | Obs Mean Std. Dev. Min Max
                -------------+---------------------------------------------------------
                age | 28,510 29.04511 6.700584 14 46
                In this misspecified model the turning point is at -age-=45.13 (as this value falls in the range of -age- values it can be considered acceptable as a turning point).
                Unless you want to include a third term (-country-) in the interaction code, the presence/absence of significance is actually referred to all the panels included in your regression.
                I fail to get how quadratic country-specific time or logging could handle the issue.

                As an aside, I do share Eric's advice to consider within R-sq for -fe- models; even though -erturn list- will give you back an adjusted Rsq e(r2_a) after -xtreg,fe-, it is difficult to envisage what it would be useful for.
                Eventually, please note that if your T dimension>N dimension (or they are similar) you could consider long panels commands, such as -xtgls- and -xtregar-.
                Hi Carrlo,

                Thanks for your response. Your answer makes sense to me. I am facing a few questions regarding panel data commands in general and would be extremely thankful if I could get a response from you.

                (1) I am running a panel data set with the main regressor as a lagged variable. Does squaring a lagged variable induce any problems. I plan to conduct regressions with first one lags, then including two lags and finally with three lags. Can the presence of three quadratic terms cause a problem in the model?

                (2) In your earlier responses, you mentioned about creating a quadratic term using c.age##c.age. What is the significance of the "c." in this command?

                (3) Similarly, when I need to specify the country fixed effects and time dummies, I write, i.Year i.Country , in my regression model, what is the significance of "i." here and why can we not use "c."?

                I would be thankful for a response.

                Comment


                • #9
                  Himani:
                  1) whether linear and quadratic terms for lagged independent variables make sense in your model, it is a matter of what is considered methodologically acceptable in your research field. Did previous researchers adopt the approach you've have in mind?
                  (2) and (3): -c.- (-i.-) prefix tells Stata that the variable should be treated as continuous (categorical); see -fvvarlist- for more details.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Originally posted by Carlo Lazzaro View Post
                    Himani:
                    1) whether linear and quadratic terms for lagged independent variables make sense in your model, it is a matter of what is considered methodologically acceptable in your research field. Did previous researchers adopt the approach you've have in mind?
                    (2) and (3): -c.- (-i.-) prefix tells Stata that the variable should be treated as continuous (categorical); see -fvvarlist- for more details.
                    Thanks a lot Carlo!!! I was really helpful. I am a beginner right now and wanted to ask if there were any books I could read in order to supplement my knowledge in this area. Thank you!

                    Comment


                    • #11
                      Himani:
                      Stata users dealing with panel data regression like https://www.stata.com/bookstore/micr...metrics-stata/.
                      Obviously, Stata .pdf manual and Stata commands helpfiles are valuable sources of knowledge.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Originally posted by Carlo Lazzaro View Post
                        Himani:
                        Stata users dealing with panel data regression like https://www.stata.com/bookstore/micr...metrics-stata/.
                        Obviously, Stata .pdf manual and Stata commands helpfiles are valuable sources of knowledge.
                        Hi Carlo,

                        Thanks a lot for this inflormation!

                        Comment

                        Working...
                        X