Removing insignificant variables

Julia Raciniewska

Join Date: Feb 2019
Posts: 38

Removing insignificant variables

24 Mar 2019, 13:56

Hello,
As an example here, I am trying to estimate the following regression and compare when my dependent variable is non-routine vs routine (that is share of employees doing routine vs non-routine tasks). In reality, I also have 2 other dependent variables between which I want to compare results.
My question is the following: if there are some estimators which come up as significant in the first regression, but insignificant in the second (P>0.1), should I remove those from the second regression to make it more efficient, or just leave

Code:

. xtreg nonroutine using_computer lngva  price_computer total_internet_access sharedegre
> e sharehigher shareother, fe vce(robust)

Fixed-effects (within) regression               Number of obs     =        120
Group variable: industry1                       Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.3276                                         min =         12
     between = 0.4580                                         avg =       12.0
     overall = 0.4408                                         max =         12

                                                F(7,9)            =      16.27
corr(u_i, Xb)  = 0.5375                         Prob > F          =     0.0002

                                   (Std. Err. adjusted for 10 clusters in industry1)
------------------------------------------------------------------------------------
                   |               Robust
           nonrout |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
    using_computer |   .0014271   .0004206     3.39   0.008     .0004755    .0023786
             lngva |  -.0193869   .0317928    -0.61   0.557    -.0913072    .0525334
    price_computer |   .0014037   .0009901     1.42   0.190     -.000836    .0036434
total_internet_a~s |   .0041153   .0022304     1.85   0.098    -.0009303    .0091609
       sharedegree |   .0926562   .1112741     0.83   0.427    -.1590632    .3443756
       sharehigher |  -.2771514   .1359427    -2.04   0.072    -.5846752    .0303723
        shareother |   .1583427   .0836769     1.89   0.091    -.0309475    .3476329
             _cons |    .200577   .5024723     0.40   0.699    -.9360942    1.337248
-------------------+----------------------------------------------------------------
           sigma_u |   .1681399
           sigma_e |  .01373558
               rho |  .99337076   (fraction of variance due to u_i)
---------------------------------------------------------------------------------

VS

. xtreg routsem using_computer lngva  price_computer total_internet_access sharedegre
> e sharehigher shareother, fe vce(robust)

Fixed-effects (within) regression               Number of obs     =        120
Group variable: industry1                       Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.0458                                         min =         12
     between = 0.7000                                         avg =       12.0
     overall = 0.6918                                         max =         12

                                                F(7,9)            =       9.83
corr(u_i, Xb)  = 0.7868                         Prob > F          =     0.0014

                                   (Std. Err. adjusted for 10 clusters in industry1)
------------------------------------------------------------------------------------
                   |               Robust
           routsem |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
    using_computer |  -.0003031   .0002802    -1.08   0.308    -.0009371    .0003308
             lngva |  -.0181579   .0289361    -0.63   0.546     -.083616    .0473002
    price_computer |   -.000896   .0002683    -3.34   0.009     -.001503    -.000289
total_internet_a~s |  -.0018722   .0009078    -2.06   0.069    -.0039258    .0001813
       sharedegree |  -.0601111    .083996    -0.72   0.492    -.2501232     .129901
       sharehigher |  -.0041789   .1033247    -0.04   0.969    -.2379157    .2295579
        shareother |  -.0227866   .1205998    -0.19   0.854    -.2956024    .2500292
             _cons |   .7114436   .3188556     2.23   0.053     -.009858    1.432745
-------------------+----------------------------------------------------------------
           sigma_u |   .1552167
           sigma_e |  .01099854
               rho |  .99500405   (fraction of variance due to u_i)
------------------------------------------------------------------------------------

. 
end of do-file

Also - I have performed the Hausman test to ensure that I should use a fixed effects model. Are there any other tests I should consider to check for endogeneity and think about instruments? I am just learning about panel data models now.

Thank you very much.

Tags: endogeneity, fixed effects, panel data, regression, significance

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

24 Mar 2019, 15:49

Julia:
highly significant F-test with most of your coefficients lacking statistical significance, it's a sign of quasi-extreme multicollinearity. I would reconsider your model specification.

Kind regards,
Carlo
(Stata 19.0)
Comment
Julia Raciniewska

Join Date: Feb 2019

Posts: 38
#3

24 Mar 2019, 16:17

Carlo Lazzaro thank you for your response. In that case, would that be wise to run regression using reg, then vif to make sure no multicollinearity exists, and once it is done: xtreg, fe ?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#4

24 Mar 2019, 16:32

Julia:
see:
https://www.statalist.org/forums/for...for-panel-data

Kind regards,
Carlo
(Stata 19.0)
Comment
Julia Raciniewska

Join Date: Feb 2019

Posts: 38
#5

25 Mar 2019, 01:06

Carlo Lazzaro Thank you very much, sir. Just one last question - should I use estat vce, corr after the xtreg, fe or xtreg, fe vce(robust) ?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#6

25 Mar 2019, 03:40

Julia:
after -xtreg, fe vce(robust)- (please note that brackets are redundant: - xtreg, fe robust- will do the same job, saving you some key strikes).
As an aside, being sir, OBE and the like quite far from my current salutation status, please call me Carlo, like all on (and many more off) this list do. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Julia Raciniewska

Join Date: Feb 2019

Posts: 38
#7

25 Mar 2019, 05:25

Of course, Carlo. Thank you for your feedback. I have one more question following your advice on reconsidering model specification.
I started to consider the GMM model, as I read on other Statlist post that it may be a good idea when your t is small in panel data (in my case t=12, n=11) and you want to account for fixed effects. However, I do not have a good understanding of that model and wanted to ask you for advice on whether it is sensible to consider it for my case?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#8

25 Mar 2019, 05:34

Julia:
-gmm- requires a full awareness of its (demanding) theoretical building blocks.
If you clustered your SEs due to serial correlation, why not considering -xtregar,fe- instead?

Kind regards,
Carlo
(Stata 19.0)
Comment
Julia Raciniewska

Join Date: Feb 2019

Posts: 38
#9

25 Mar 2019, 06:01

Carlo:
That was the issue I had faced: how can I test for serial correlation with panel data? I tried:

Code:

estat bgodfrey subcommand estat bgodfrey is unrecognized r(321);

or

Code:

. corrgram x1 sample may not include multiple panels r(459);

I would be very thankful for advice. If I do not find serial correlation - does it mean I should just go with fixed effects -xtreg, fe- whereas if I find, then -xtregar, fe-?
Very much appreciated.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#10

25 Mar 2019, 08:37

Julia:
see: https://www.statalist.org/forums/for...for-panel-data.
Switching from -xtreg,fe-to --xtregar,fe- is advisable because of N=T in your dataset.
If you find serial correlation with N>T, just use -xtreg- with cluster-robust standard errors.
If you fin serial correlation with N=T or N>T, just go -xtregar-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Julia Raciniewska

Join Date: Feb 2019

Posts: 38
#11

25 Mar 2019, 15:19

Carlo:
Thank you very much. I have the following question: I want to compare coefficients on computer_use between 2 equations, however according to -xtserial- I have serial correlation in the 2nd. Is it still possible to interpret and compare coefficients between the 2 if one was specified using -xtreg, fe- while the other with -xtregar-?

Code:

xtreg share_routine using_computer lngva computer_networks sharedegree sharehigher shareother, fe vce(robust) xtregar sharelower using_computer lngva computer_networks sharedegree sharehigher shareother, fe
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#12

26 Mar 2019, 00:55

No, I do not think so.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Removing insignificant variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment