Intuition for including a quadratic variable in a fixed effects regression model

Himani Srihan

Join Date: Apr 2020

Posts: 51
#1

Intuition for including a quadratic variable in a fixed effects regression model

14 Nov 2020, 06:03

Hi everyone,

I was facing a question regarding the intention for including a quadratic variable in a fixed effects regression model. For instance, if there originally exists a quadratic relationship between two variables X and Y, we need to include a quadratic form of the variable to take that into account. However, in fixed effects regression model, the variables are demeaned, that is, the mean is subtracted from the original values and then the regression takes place. So, just in case of fixed effects, is there any need to enter the quadratic form because the quadratic relationship does not exist after the variable is demeaned. I would be thankful for a response!
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17726

14 Nov 2020, 06:46

Himani:
as usual, you can plug both the linear and quadratic term of the regressor you're ijterested in and see what Stata gives you back, as in the following badly misspecified toy-example:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage c.age##c.age, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,23798)        =    1451.88
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076   .0028078    19.20   0.000     .0484041    .0594112
             |
 c.age#c.age |  -.0005973   .0000465   -12.84   0.000    -.0006885   -.0005061
             |
       _cons |    .639913   .0408906    15.65   0.000     .5597649    .7200611
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23798) = 8.74                 Prob > F = 0.0000

Kind regards,
Carlo
(Stata 19.0)

Comment

Eric de Souza

Join Date: Mar 2014

Posts: 587
#3

14 Nov 2020, 07:11

You should not confuse the model you wish to estimate with the estimation method.
If you think that your model should include a quadratic term, insert it into your model.
Fixed Effects is an estimation method to "assign" values to the coefficients of your model.
2 likes
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#4

14 Nov 2020, 08:18

Originally posted by Carlo Lazzaro View Post

Himani:
as usual, you can plug both the linear and quadratic term of the regressor you're ijterested in and see what Stata gives you back, as in the following badly misspecified toy-example:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta" (National Longitudinal Survey. Young Women 14-26 years of age in 1968) . xtreg ln_wage c.age##c.age, fe Fixed-effects (within) regression Number of obs = 28,510 Group variable: idcode Number of groups = 4,710 R-sq: Obs per group: within = 0.1087 min = 1 between = 0.1006 avg = 6.1 overall = 0.0865 max = 15 F(2,23798) = 1451.88 corr(u_i, Xb) = 0.0440 Prob > F = 0.0000 ------------------------------------------------------------------------------ ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0539076 .0028078 19.20 0.000 .0484041 .0594112 | c.age#c.age | -.0005973 .0000465 -12.84 0.000 -.0006885 -.0005061 | _cons | .639913 .0408906 15.65 0.000 .5597649 .7200611 -------------+---------------------------------------------------------------- sigma_u | .4039153 sigma_e | .30245467 rho | .64073314 (fraction of variance due to u_i) ------------------------------------------------------------------------------ F test that all u_i=0: F(4709, 23798) = 8.74 Prob > F = 0.0000

Hi Carrlo,

Thank you for your response. I tried doing what you suggested and I do get different signs on the linear and quadratic terms. The problem I am facing here is that the sign of the regressor is not obvious and is something I am trying to explore. Is there any way to solve this problem? In addition I am studying 40 cross section units over 20 years. So if the relationship between two variables is quadratic for one country, it is not necessary that it will be quadratic for another country. So how do we decide on the functional form then? Could including country specific time trends solve this problem? For instance, if I try to include linear/ quadratic country specific time trends, can it compensate for quadratic relationship in case I decide to use linear regressor? Or can taking log of the variable solve this? Thank you for taking out time to clear this!
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#5

14 Nov 2020, 08:21

Originally posted by Eric de Souza View Post

You should not confuse the model you wish to estimate with the estimation method.
If you think that your model should include a quadratic term, insert it into your model.
Fixed Effects is an estimation method to "assign" values to the coefficients of your model.

Hi Eric,

Thanks for your response. I tried comparing the R square (basically within, since we are doing fixed effects). The R square for the two models are the almost the same. Is there any other way to compare the fits of the two models? Thanks!
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#6

14 Nov 2020, 09:20

Since I hardly ever compare the fits of two models, I wouldn't know. But in your case, since the estimation method you use is the within estimator, the corresponding R-squared would indeed be the within R-squared.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17726
#7

14 Nov 2020, 10:56

Himani:
the issue for a quadratic relationship between regressand and a given regressor is not the sign of the two terms, but their statistical significance (if the squared term does not reach statistical significance, there's no evidence of a quadratic relationship).
That said, the usual form for calculating a turning point is -b/2a.
In the previous example:

Code:

. di -( .0539076)/(2*(-.0005973)) 45.126067 . sum age Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- age | 28,510 29.04511 6.700584 14 46

In this misspecified model the turning point is at -age-=45.13 (as this value falls in the range of -age- values it can be considered acceptable as a turning point).
Unless you want to include a third term (-country-) in the interaction code, the presence/absence of significance is actually referred to all the panels included in your regression.
I fail to get how quadratic country-specific time or logging could handle the issue.

As an aside, I do share Eric's advice to consider within R-sq for -fe- models; even though -erturn list- will give you back an adjusted Rsq e(r2_a) after -xtreg,fe-, it is difficult to envisage what it would be useful for.
Eventually, please note that if your T dimension>N dimension (or they are similar) you could consider long panels commands, such as -xtgls- and -xtregar-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#8

14 Nov 2020, 16:19

Originally posted by Carlo Lazzaro View Post

Himani:
the issue for a quadratic relationship between regressand and a given regressor is not the sign of the two terms, but their statistical significance (if the squared term does not reach statistical significance, there's no evidence of a quadratic relationship).
That said, the usual form for calculating a turning point is -b/2a.
In the previous example:

Code:

. di -( .0539076)/(2*(-.0005973)) 45.126067 . sum age Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- age | 28,510 29.04511 6.700584 14 46

In this misspecified model the turning point is at -age-=45.13 (as this value falls in the range of -age- values it can be considered acceptable as a turning point).
Unless you want to include a third term (-country-) in the interaction code, the presence/absence of significance is actually referred to all the panels included in your regression.
I fail to get how quadratic country-specific time or logging could handle the issue.

As an aside, I do share Eric's advice to consider within R-sq for -fe- models; even though -erturn list- will give you back an adjusted Rsq e(r2_a) after -xtreg,fe-, it is difficult to envisage what it would be useful for.
Eventually, please note that if your T dimension>N dimension (or they are similar) you could consider long panels commands, such as -xtgls- and -xtregar-.

Hi Carrlo,

Thanks for your response. Your answer makes sense to me. I am facing a few questions regarding panel data commands in general and would be extremely thankful if I could get a response from you.

(1) I am running a panel data set with the main regressor as a lagged variable. Does squaring a lagged variable induce any problems. I plan to conduct regressions with first one lags, then including two lags and finally with three lags. Can the presence of three quadratic terms cause a problem in the model?

(2) In your earlier responses, you mentioned about creating a quadratic term using c.age##c.age. What is the significance of the "c." in this command?

(3) Similarly, when I need to specify the country fixed effects and time dummies, I write, i.Year i.Country , in my regression model, what is the significance of "i." here and why can we not use "c."?

I would be thankful for a response.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17726
#9

15 Nov 2020, 05:08

Himani:
1) whether linear and quadratic terms for lagged independent variables make sense in your model, it is a matter of what is considered methodologically acceptable in your research field. Did previous researchers adopt the approach you've have in mind?
(2) and (3): -c.- (-i.-) prefix tells Stata that the variable should be treated as continuous (categorical); see -fvvarlist- for more details.

Kind regards,
Carlo
(Stata 19.0)
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#10

15 Nov 2020, 16:33

Originally posted by Carlo Lazzaro View Post

Himani:
1) whether linear and quadratic terms for lagged independent variables make sense in your model, it is a matter of what is considered methodologically acceptable in your research field. Did previous researchers adopt the approach you've have in mind?
(2) and (3): -c.- (-i.-) prefix tells Stata that the variable should be treated as continuous (categorical); see -fvvarlist- for more details.

Thanks a lot Carlo!!! I was really helpful. I am a beginner right now and wanted to ask if there were any books I could read in order to supplement my knowledge in this area. Thank you!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17726
#11

16 Nov 2020, 01:01

Himani:
Stata users dealing with panel data regression like https://www.stata.com/bookstore/micr...metrics-stata/.
Obviously, Stata .pdf manual and Stata commands helpfiles are valuable sources of knowledge.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Himani Srihan

Join Date: Apr 2020

Posts: 51
#12

17 Nov 2020, 15:36

Originally posted by Carlo Lazzaro View Post

Himani:
Stata users dealing with panel data regression like https://www.stata.com/bookstore/micr...metrics-stata/.
Obviously, Stata .pdf manual and Stata commands helpfiles are valuable sources of knowledge.

Hi Carlo,

Thanks a lot for this inflormation!
Comment

Announcement

Intuition for including a quadratic variable in a fixed effects regression model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment