Time fixed effect in panel data analysis- how to deal if including time fixed effect in FE model makes coefficient insignificant?

Van Pham

Join Date: Sep 2015

Posts: 10
#1

Time fixed effect in panel data analysis- how to deal if including time fixed effect in FE model makes coefficient insignificant?

15 Sep 2015, 08:41

Hello everyone,

I have a question related to time fixed effects in panel data analysis. I am running FE model on the panel data inlcuding 34 countries from 1995 to 2013 with the gaps (1995-2000-2005 -2010 -2013 since the data for 2015 not available). My problem is that if I just control for country effects (using cross-sectional fixed effects), the results are fine, but if I control for both country and year effects, the coffecients of most of independent variables turn to insignificant, even with the unexpected signs. I do not know this problem arises from what? And do we need always to control for year and country effects when running FE model? Or that will depend on some specific conditions?

I am quite new with panel data analysis and also with Stata. I have tried to look for solution/idea on the websites and also some textbooks but couldn't figure it out. I hope joining this forum and ask question here can give me some useful idea/suggestions from experienced Stata users/statisticians for my problem.

Thanks and kind regards,

Van
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

15 Sep 2015, 08:59

Hello Van,

Welcome to the Forum.

Please don't forget to take a look at the manual (http://www.stata.com/manuals13/xtxtreg.pdf). This is always the best start up.

That said, I suggest you type this command in order to test if you need to include a time variable in your FE model:

Code:

. testparm i.year

Best,

Marcos

Best regards,

Marcos
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#3

15 Sep 2015, 09:29

Van,
welcome to the list.
Have you investigated multicollinearity?

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Van Pham

Join Date: Sep 2015

Posts: 10
#4

15 Sep 2015, 18:15

Originally posted by Marcos Almeida View Post

Hello Van,

Welcome to the Forum.

Please don't forget to take a look at the manual (http://www.stata.com/manuals13/xtxtreg.pdf). This is always the best start up.

That said, I suggest you type this command in order to test if you need to include a time variable in your FE model:

Code:

. testparm i.year

Best,

Marcos

Hi Marcos,

Thank you for your advice. I already used "testparm" to test and the result showed that there is time effect in my data (since Prob > F = 0.0115), so that means I need to include time fixed effect in my model, right? But when I controled this effect, the results turned to insignificant, and the signs were unexpected. I don't know how to deal with it!

Thanks and kind regards,

Van
Comment
Van Pham

Join Date: Sep 2015

Posts: 10
#5

15 Sep 2015, 18:20

Originally posted by Carlo Lazzaro View Post

Van,
welcome to the list.
Have you investigated multicollinearity?

Hi Carlo,

As you suggested, I checked the multicollinearity and the results showed that among 5 independent variables, there was one pair e.g X1 and X2, they had high correlation (r = 0.9171), so that means there is multicollinearity problem? Do you have any further suggestion for my case?

Thanks and kind regards,

Van
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#6

15 Sep 2015, 22:36

Van:
yes, it would seem so.
The basic approach would be to get rid of X1 or X2 from the set of your predictors.
Another useful postestimation tool for investigating multicollinearity is -estat vif-.
You may want also to center the variables charged for multicollinearity around a meaningful value (i.e., their respective mean) and see if multicollinearity disappear.

Last edited by Carlo Lazzaro; 15 Sep 2015, 23:36.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#7

16 Sep 2015, 07:34

Hello Van,

Unfortunately, you didn't present your commands + output. We also don't know anything about the rationale for the model selection, including the variables. I believe you may get further help from the Forum members if you clarify these issues.

That said, maybe it is a question of power. Tentatively, I'd consider the possibility that your model, when adjusted for (significant) variations due to time, cannot "explain" the outcome variable anymore.

Best,

Marcos

Best regards,

Marcos
1 like
Comment
Van Pham

Join Date: Sep 2015

Posts: 10
#8

16 Sep 2015, 18:22

Originally posted by Carlo Lazzaro View Post

Van:
yes, it would seem so.
The basic approach would be to get rid of X1 or X2 from the set of your predictors.
Another useful postestimation tool for investigating multicollinearity is -estat vif-.
You may want also to center the variables charged for multicollinearity around a meaningful value (i.e., their respective mean) and see if multicollinearity disappear.

Hi Carlo,

Thank you for your advice. Actually I think I can't get rid X1 or X2 since they seem both affect the outcome but I also investigated muticollinearity by -estat vif- and got vif#6 <10, so does it mean I can keep these both variables. I don't know how to

enter the variables charged for multicollinearity around a meaningful value

as you suggested, could you explain me further about it?

Thanks and kind regards,

Van
Comment

Van Pham

Join Date: Sep 2015
Posts: 10

16 Sep 2015, 18:37

Originally posted by Marcos Almeida View Post

Hello Van,

Unfortunately, you didn't present your commands + output. We also don't know anything about the rationale for the model selection, including the variables. I believe you may get further help from the Forum members if you clarify these issues.

That said, maybe it is a question of power. Tentatively, I'd consider the possibility that your model, when adjusted for (significant) variations due to time, cannot "explain" the outcome variable anymore.

Best,

Marcos

Hi Marcos,

Thank you for your feedback, and sorry because I didn't express my model clearly. I want to investigate whether variables as GDP per capita, public health expenditure per capita, remittances per capita, institution index and religion share of population (all these variables expressed in log form, except the var religion) affect the health outcome (mortality rate) in 34 developing countries from 1995-2013 with the gaps (1995-2000-2005-2010-2013). I decided to use FE model,

If I didn't control time effect, the results came out

Code:

xtreg y X1 X2 X3 X4 X5, fe vce(cluster country1)

Fixed-effects (within) regression               Number of obs      =    170
Group variable: country1                        Number of groups   =    34

R-sq:  within  = 0.7628                         Obs per group: min =    5
between = 0.6116                                        avg =    5.0
overall = 0.6078                                        max =    5

F(5,33)            =    43.95
corr(u_i, Xb)  = -0.7623                        Prob > F           =    0.0000

(Std. Err. adjusted for 34 clusters in    country1)
    
Robust
y       Coef.   Std. Err.      t    P>t     [95% Conf.    Interval]
    
X1   -.5159524    .160661    -3.21   0.003    -.8428196    -.1890852
X2   -.3479108   .0965839    -3.60   0.001    -.5444122    -.1514093
X3    -.057263   .0207047    -2.77   0.009    -.0993871    -.0151389
X4    -.647028   .2368905    -2.73   0.010    -1.128985    -.1650707
X5   -.0662242   .0600367    -1.10   0.278    -.1883698    .0559215
_cons    4.336472   .3752322    11.56   0.000     3.573056    5.099888
    
sigma_u   .33813748
sigma_e   .07142181
rho   .95729103   (fraction of variance due to u_i)

but when time fixed effect included, the results turned unexpectedly,

Code:

 xtreg y X1 X2 X3 X4 X5 i. YEAR, fe vce(cluster country1)

Fixed-effects (within) regression               Number of obs      =       170
Group variable: country1                        Number of groups   =        34

R-sq:  within  = 0.8915                         Obs per group: min =         5
       between = 0.5603                                        avg =       5.0
       overall = 0.5341                                        max =         5

                                                F(9,33)            =     52.46
corr(u_i, Xb)  = 0.3567                         Prob > F           =    0.0000

                              (Std. Err. adjusted for 34 clusters in country1)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          X1 |  -.1373168   .1265182    -1.09   0.286    -.3947201    .1200865
          X2 |  -.0504063   .0820072    -0.61   0.543    -.2172512    .1164387
          X3 |  -.0026189   .0170725    -0.15   0.879    -.0373532    .0321154
          X4 |  -.1891726    .242665    -0.78   0.441    -.6828782    .3045329
          X5 |  -.0400426    .039199    -1.02   0.314    -.1197936    .0397084
             |
        YEAR |
       2000  |   -.070729   .0112445    -6.29   0.000    -.0936061    -.047852
       2005  |  -.1546256   .0167661    -9.22   0.000    -.1887365   -.1205146
       2010  |  -.2320385   .0245594    -9.45   0.000     -.282005    -.182072
       2015  |  -.2743113   .0308065    -8.90   0.000    -.3369876   -.2116349
             |
       _cons |   2.607344   .3701531     7.04   0.000     1.854262    3.360426
-------------+----------------------------------------------------------------
     sigma_u |  .26186703
     sigma_e |  .04905963
         rho |  .96609176   (fraction of variance due to u_i)
------------------------------------------------------------------------------

I don't know how to deal with this problem. Hope I can get some suggestions from you and other members.

Thanks and kind regards,

Van

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#10

16 Sep 2015, 22:45

Van:
probably you are not experiencing a multicollinearity issue.
Anyway, you can center (say) X1 around its (say) mean following this approach:

quietly sum X1
gen centered_X1=X1-r(mean)

Last edited by Carlo Lazzaro; 16 Sep 2015, 23:39.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#11

16 Sep 2015, 23:50

Van.
a tempative answer is that, when adjusted for time, the -indepvars- X* do not change that much among clusters, so they do not reach statistical significance.
However, their effect on the -depvar- increases with time.
I would also consider the limited numbers of observations per cluster as another possible explanation.
In general, I would say that your results are logically consistent: an overall reduction in mortality rate across about 20 years is documented in national statistics of developed countries.
In this kind of analysis, the main risk is endogeneity: a hypothethical variable such as # of traffic lights per squared-miles can affect both health care expenditure and mortality rate.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Van Pham

Join Date: Sep 2015

Posts: 10
#12

17 Sep 2015, 00:55

Hi Carlo,

Thank you for your explanation about muticollinearity which help broaden my knowledge. As you said, it seems not an issue for my data but do you think I should replace X1 by centered_X1 in the equation?

And also about the time fixed-effect issue, as you suggested

I would also consider the limited numbers of observations per cluster as another possible explanation.

, that means I need to increase the sample size to deal with this issue?

Or I need to find another model to circumvent the risk of endogeneity?

In this kind of analysis, the main risk is endogeneity: a hypothethical variable such as # of traffic lights per squared-miles can affect both health care expenditure and mortality rate

I thought that using fixed effect model for my panel data can possibly deal with endogeneity issue if it exists since it helps to eliminate/avoid bias due to unobserved variables. Maybe I am wrong? I know that endogeneity commonly is dealed by intrumental variable approach, but for my panel data, now I haven't found any instruments. Is there any test/command in Stata which can help test the endogenity issue? And if the endogeneity exists, do you have any further suggestions for my case?

I really need the consultation from you and other members.

Thanks and kind regards,

Van
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#13

17 Sep 2015, 01:31

Van:
- increasing your sample size, if feasible, would be a good thing to do;
-you're right about -fe- cancelling out unobserved heterogeneity, but not necessarily endogeneity (please, see -xtivreg- entry, Example 1, page 203-206 in Stata .pdf manual).
By the way, have you ruled out via -hausman- that -re- specification might be the way to go with your data?

Kind regards,
Carlo
(Stata 19.0)
Comment
Van Pham

Join Date: Sep 2015

Posts: 10
#14

17 Sep 2015, 21:18

Hi Carlo,

Thank you so much for your advice. I progressed as you recommended and I have some questions

-First, as you suggested, I increased the sample size, now I have panel data including 70 countries from 1995-2010 with the gap. The results seem a little improved with fe model. I also used hausman test to check re specification may be appropriate for my data. I have 2 dependent variables (Y1=life expectancy and Y2=mortality rate) and the set of explanatory variables (X1=GDP per capita, X2= public health expenditure per capita, X3= remittances per capita, X4 = muslim share, X5=Christian share and X6= institution index, actually I want to focus on the impact of X3, X4, X5 on Y and take X1, X2, X6 as the control). When I use the the hausman test, for Y1, the results showed re model is more proper, while with Y2, fe model is more efficient (I attacched the output for this test below). I don't know why there is difference?

Code:

. quietly xtreg Y1 X1 X2 X3 X4 X5 X6, fe . estimates store fe . quietly xtreg Y1 X1 X2 X3 X4 X5 X6, re . estimates store re . hausman fe re ---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe re Difference S.E. -------------+---------------------------------------------------------------- X1 | .0144752 .0323499 -.0178746 .010758 X2 | .0450077 .0375865 .0074213 .0046679 X3 | .009468 .0099054 -.0004374 .0007683 X4 | .03281 .00072 .03209 .0289184 X5 | .0390245 .0085832 .0304412 .0559187 X6 | .0121852 .0106251 .0015601 .0011823 ------------------------------------------------------------------------------ b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 8.68 Prob>chi2 = 0.1925

Code:

quietly xtreg Y2 X1 X2 X3 X4 X5 X6, fe . estimates store fe . quietly xtreg Y2 X1 X2 X3 X4 X5 X6, re . estimates store re . hausman fe re ---- Coefficients ---- | (b) (B) (b-B) sqrt(diag(V_b-V_B)) | fe re Difference S.E. -------------+---------------------------------------------------------------- X1 | -.4937589 -.3733752 -.1203837 .0371363 X2 | -.3286132 -.3264546 -.0021586 .0085396 X3 | -.0699457 -.0748083 .0048626 . X4 | .1808935 .2190789 -.0381854 .1152278 X5 | -.6969811 .0986462 -.7956273 .2288567 X6 | -.0623908 -.0727297 .0103389 . ------------------------------------------------------------------------------ b = consistent under Ho and Ha; obtained from xtreg B = inconsistent under Ha, efficient under Ho; obtained from xtreg Test: Ho: difference in coefficients not systematic chi2(6) = (b-B)'[(V_b-V_B)^(-1)](b-B) = 39.45 Prob>chi2 = 0.0000 (V_b-V_B is not positive definite)

And if according to result from Hausman test, I choose re model when analysing Y1, is it possible to choose re model for panel data when sample is not randomly selected (just based on the available of the data at examining time) and in re model, do we need to take time effect into account as fe model?

-Second, as you suggested about-xtivreg-, I tried to look at this command at http://www.stata.com/manuals13/xtxtivreg.pdf even though it is not easy for me to understand clearly, I think if I want to use this command, I still need to find the instrument right or is there another way to approach without instruments?

-Third, regarding -vce- option, I read some material they suggest to use vce(cluster ) to deal with heteroskedasticity and autocorrelation, is this always the case or it depends? I also checked my data and the result showed there was heteroskedasticity(I used xttest3), but for autocorelation, I used xtserial but this reports r2000( no obseravtions), I don't know why?

- Another question came from my mind is that, is it possible for me to run binary choice model if I change dependent varaibles binary variable (as if life expectancy achieves the goal, it takes Y1 = 1, otherwise Y1=0). I just read some theory about this model, but it seems complicated, I saw there is function that we can run that model by Stata, but I wondered what are the assumptions to choose and run this kind of model (I saw there are some options as probit re, logistic fe, re...). Do you have any experience about this model or any guide not much complex, especially on Stata that can help me to know about this model to see whether I can run this model with my data or not?

Again, thank you for your consultation. It helps me alot. I hope I can continue receiving your advice and suggestion from you and other members.

Kind regards,

Van
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#15

17 Sep 2015, 22:45

Van:
- the differenece in -hausman- may be due to he fact that Y1 and Y2 are expressed in different units; Anyway, the guidance about which -depvar- to include should come from the literature of your research field;
- random effect specification has nothing to do with random selected sample. Yes, you can try to include time effect as you did with the -fe- specification;
- you're right: the main issue with IV regression is to find out the right instruments. Often, the literature in your research fiesl can give you some clues about them;
- serial correlation is often present in panel data analysis; the same doesn't necessarily hold for heteroskedasticity. You can use -vce(cluster)- to manage both of them;
. about the weird output of -xttest-, it's difficult to say. It may be due to missing values: check it yourself it this is the case;
- binary choice models for panel data may be deployed via -xtlogit-. Their results are often more dificult to get than those obained from -xtreg- if you have limited experience with this kind of stuff.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement