Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing insignificant variables

    Hello,
    As an example here, I am trying to estimate the following regression and compare when my dependent variable is non-routine vs routine (that is share of employees doing routine vs non-routine tasks). In reality, I also have 2 other dependent variables between which I want to compare results.
    My question is the following: if there are some estimators which come up as significant in the first regression, but insignificant in the second (P>0.1), should I remove those from the second regression to make it more efficient, or just leave

    Code:
    . xtreg nonroutine using_computer lngva  price_computer total_internet_access sharedegre
    > e sharehigher shareother, fe vce(robust)
    
    Fixed-effects (within) regression               Number of obs     =        120
    Group variable: industry1                       Number of groups  =         10
    
    R-sq:                                           Obs per group:
         within  = 0.3276                                         min =         12
         between = 0.4580                                         avg =       12.0
         overall = 0.4408                                         max =         12
    
                                                    F(7,9)            =      16.27
    corr(u_i, Xb)  = 0.5375                         Prob > F          =     0.0002
    
                                       (Std. Err. adjusted for 10 clusters in industry1)
    ------------------------------------------------------------------------------------
                       |               Robust
               nonrout |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------------+----------------------------------------------------------------
        using_computer |   .0014271   .0004206     3.39   0.008     .0004755    .0023786
                 lngva |  -.0193869   .0317928    -0.61   0.557    -.0913072    .0525334
        price_computer |   .0014037   .0009901     1.42   0.190     -.000836    .0036434
    total_internet_a~s |   .0041153   .0022304     1.85   0.098    -.0009303    .0091609
           sharedegree |   .0926562   .1112741     0.83   0.427    -.1590632    .3443756
           sharehigher |  -.2771514   .1359427    -2.04   0.072    -.5846752    .0303723
            shareother |   .1583427   .0836769     1.89   0.091    -.0309475    .3476329
                 _cons |    .200577   .5024723     0.40   0.699    -.9360942    1.337248
    -------------------+----------------------------------------------------------------
               sigma_u |   .1681399
               sigma_e |  .01373558
                   rho |  .99337076   (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------
    
    VS
    
    . xtreg routsem using_computer lngva  price_computer total_internet_access sharedegre
    > e sharehigher shareother, fe vce(robust)
    
    Fixed-effects (within) regression               Number of obs     =        120
    Group variable: industry1                       Number of groups  =         10
    
    R-sq:                                           Obs per group:
         within  = 0.0458                                         min =         12
         between = 0.7000                                         avg =       12.0
         overall = 0.6918                                         max =         12
    
                                                    F(7,9)            =       9.83
    corr(u_i, Xb)  = 0.7868                         Prob > F          =     0.0014
    
                                       (Std. Err. adjusted for 10 clusters in industry1)
    ------------------------------------------------------------------------------------
                       |               Robust
               routsem |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------------+----------------------------------------------------------------
        using_computer |  -.0003031   .0002802    -1.08   0.308    -.0009371    .0003308
                 lngva |  -.0181579   .0289361    -0.63   0.546     -.083616    .0473002
        price_computer |   -.000896   .0002683    -3.34   0.009     -.001503    -.000289
    total_internet_a~s |  -.0018722   .0009078    -2.06   0.069    -.0039258    .0001813
           sharedegree |  -.0601111    .083996    -0.72   0.492    -.2501232     .129901
           sharehigher |  -.0041789   .1033247    -0.04   0.969    -.2379157    .2295579
            shareother |  -.0227866   .1205998    -0.19   0.854    -.2956024    .2500292
                 _cons |   .7114436   .3188556     2.23   0.053     -.009858    1.432745
    -------------------+----------------------------------------------------------------
               sigma_u |   .1552167
               sigma_e |  .01099854
                   rho |  .99500405   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------------
    
    . 
    end of do-file
    Also - I have performed the Hausman test to ensure that I should use a fixed effects model. Are there any other tests I should consider to check for endogeneity and think about instruments? I am just learning about panel data models now.

    Thank you very much.

  • #2
    Julia:
    highly significant F-test with most of your coefficients lacking statistical significance, it's a sign of quasi-extreme multicollinearity. I would reconsider your model specification.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo Lazzaro thank you for your response. In that case, would that be wise to run regression using reg, then vif to make sure no multicollinearity exists, and once it is done: xtreg, fe ?

      Comment


      • #4
        Julia:
        see:
        https://www.statalist.org/forums/for...for-panel-data
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo Lazzaro Thank you very much, sir. Just one last question - should I use estat vce, corr after the xtreg, fe or xtreg, fe vce(robust) ?

          Comment


          • #6
            Julia:
            after -xtreg, fe vce(robust)- (please note that brackets are redundant: - xtreg, fe robust- will do the same job, saving you some key strikes).
            As an aside, being sir, OBE and the like quite far from my current salutation status, please call me Carlo, like all on (and many more off) this list do. Thanks.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Of course, Carlo. Thank you for your feedback. I have one more question following your advice on reconsidering model specification.
              I started to consider the GMM model, as I read on other Statlist post that it may be a good idea when your t is small in panel data (in my case t=12, n=11) and you want to account for fixed effects. However, I do not have a good understanding of that model and wanted to ask you for advice on whether it is sensible to consider it for my case?

              Comment


              • #8
                Julia:
                -gmm- requires a full awareness of its (demanding) theoretical building blocks.
                If you clustered your SEs due to serial correlation, why not considering -xtregar,fe- instead?
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Carlo:
                  That was the issue I had faced: how can I test for serial correlation with panel data? I tried:
                  Code:
                  estat bgodfrey 
                  subcommand estat bgodfrey is unrecognized
                  r(321);
                  or

                  Code:
                  . corrgram x1
                  sample may not include multiple panels
                  r(459);
                  I would be very thankful for advice. If I do not find serial correlation - does it mean I should just go with fixed effects -xtreg, fe- whereas if I find, then -xtregar, fe-?
                  Very much appreciated.

                  Comment


                  • #10
                    Julia:
                    see: https://www.statalist.org/forums/for...for-panel-data.
                    Switching from -xtreg,fe-to --xtregar,fe- is advisable because of N=T in your dataset.
                    If you find serial correlation with N>T, just use -xtreg- with cluster-robust standard errors.
                    If you fin serial correlation with N=T or N>T, just go -xtregar-.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Carlo:
                      Thank you very much. I have the following question: I want to compare coefficients on computer_use between 2 equations, however according to -xtserial- I have serial correlation in the 2nd. Is it still possible to interpret and compare coefficients between the 2 if one was specified using -xtreg, fe- while the other with -xtregar-?
                      Code:
                      xtreg share_routine using_computer lngva computer_networks  sharedegree sharehigher shareother, fe vce(robust)
                      
                      xtregar sharelower using_computer lngva computer_networks  sharedegree sharehigher shareother, fe

                      Comment


                      • #12
                        No, I do not think so.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment

                        Working...
                        X