Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omitted dummy variable

    Dear experts,

    Using STATA, I have performed fixed effect model for my panel data (7 years, 1000+ obs), since my Hausman test indicates a value of (Prob>chi2 = 0.0000). When I add a dummy variable to observe the country effect on the model, it always get omitted. Country effect is very important to my study, as if its showing significant levels, it will allow me to consider the target variables at the country level later.

    Please note that: I have two countries only, and another dummy variable in this analysis.
    Would you suggest any idea to get this dummy variable included in the FE regression.

    Thanks,
    Hamza
    Last edited by Hamza Almustafa; 28 Oct 2016, 04:01.

  • #2
    Stata only omits variables when there is a good reason to do so, and always tells you the reason. Re-read your output carefully and you will find an explanation for it. Your task then will be to first think about whether this is expected, or whether it arises because of errors in your data. If you do not understand what Stata tells you about it, please post the full output of the regression and probably somebody will help you interpret it.

    Comment


    • #3
      I second Clyde advice about posting precisely Stata outputs.

      However, this seems (and only seems because we have no further precisions) to be a multicollinearity issue.
      You cannot add a country dummy in a fixed effect model when one of the dimensions of the fixed effects are the countries (which is the case I guess).
      The fixed effects already captures that.
      I'm pretty sure Stata tells you that your dummy variable is omitted because of collinearity.

      If you want to capture the country effect, remove the fixed effects and include the country (and the years, assuming FE are for year-country panel), dummies by hand.

      I hope this helps,
      Charlie

      Comment


      • #4
        Dear Hamza Almustafa
        In this case, it might be because the country dummy is time-invariant. So it is omitted due to perfect multicollinearity.
        I think you can change into the time fixed effect model or REM to estimate model with country dummy.
        I hope this help.

        Comment


        • #5
          Charlie Joyez Tuan Anh Hamza doesn't tell us what kind of regression he did. If it is a logistic or probit regression, the variable might also have been omitted because of perfect prediction. That's why we need to see the full output. Collinearity is probably more commonly encountered in practice, but it is not the only possibility here.

          Comment


          • #6
            Clyde Schechter I totally agree with your point, and was just indicating the most probable issue here, and took the precaution to tell that it seems to be collinearity.

            But we'd need the full output to know what exactly happens and provide a precise and certain response.

            Comment


            • #7
              Hi again, thanks for your explanations.

              I know that collinearity was the reason that stata drops country dummy variable from this model, but how would i observe the country effect on this model in this case?!

              Below is the table obtained. sorry this might looks little messy.

              Thanks,


              . xtreg lnQ l.lnQ LNBSIZE BOIND DUALITY INST1 FAM_OWN OWNTOTAL LNMARKT MBASSVALUE LNFAGE WFL i.YEAR i.COCODE, fe
              note: 1.COCODE omitted because of collinearity

              Fixed-effects (within) regression Number of obs = 894
              Group variable: CODE Number of groups = 149

              R-sq: Obs per group:
              within = 0.6232 min = 6
              between = 0.0149 avg = 6.0
              overall = 0.0302 max = 6

              F(16,729) = 75.36
              corr(u_i, Xb) = -0.8723 Prob > F = 0.0000


              lnQ Coef. Std. Err. t P>t [95% Conf. Interval]

              lnQ
              L1. .2517365 .0228843 11.00 0.000 .2068094 .2966636

              LNBSIZE -.2939403 .0730466 -4.02 0.000 -.4373472 -.1505335
              BOIND .0034148 .0888605 0.04 0.969 -.1710382 .1778679
              DUALITY -.0432432 .0485194 -0.89 0.373 -.1384977 .0520112
              INST1 7.202858 3.952101 1.82 0.069 -.5559999 14.96172
              FAM_OWN 7.274031 3.951048 1.84 0.066 -.4827581 15.03082
              OWNTOTAL -7.604984 3.950248 -1.93 0.055 -15.3602 .1502354
              LNMARKT .3600498 .0139439 25.82 0.000 .3326748 .3874249
              MBASSVALUE .4902562 .0583597 8.40 0.000 .375683 .6048294
              LNFAGE .0279163 .0439576 0.64 0.526 -.0583823 .1142149
              WFL .0002797 .0009416 0.30 0.767 -.0015688 .0021282

              YEAR
              2010 .0111264 .0174317 0.64 0.523 -.023096 .0453488
              2011 -.0109887 .0193162 -0.57 0.570 -.0489106 .0269333
              2012 -.0207852 .0216284 -0.96 0.337 -.0632465 .0216761
              2013 -.0132134 .0236131 -0.56 0.576 -.0595712 .0331443
              2014 -.0117242 .025081 -0.47 0.640 -.0609637 .0375154

              1.COCODE 0 (omitted)
              _cons -5.949549 .3490755 -17.04 0.000 -6.634862 -5.264235

              sigma_u .90423326
              sigma_e .14100968
              rho .97625884 (fraction of variance due to u_i)

              F test that all u_i=0: F(148, 729) = 9.98 Prob > F = 0.0000

              Comment


              • #8
                So look at:
                . xtreg lnQ l.lnQ LNBSIZE BOIND DUALITY INST1 FAM_OWN OWNTOTAL LNMARKT MBASSVALUE LNFAGE WFL i.YEAR i.COCODE, fe
                note: 1.COCODE omitted because of collinearity
                So your variable COCODE is colinear with something. It might be colinear with some of the other predictor variables in your model. But another, more likely, possibility is that it is colinear with whatever variable you specified as the panel variable when you -xtset- your data. This would be the case if your COCODE variable always takes on the same value for all observations in any given panel. If that is the case, you cannot estimate any effect of COCODE using a fixed-effects model. There is no "fix" for this problem: you are trying to do the impossible. To get COCODE effects you would have to go to either a random-effects or between-effects model.

                If COCODE does in fact vary within panels, then you need to find out which other variable(s) it is colinear with. You can do that with a simple regression run immediately after the one quoted above:
                Code:
                regress COCODE l.lnQ LNBSIZE BOIND DUALITY INST1 FAM_OWN OWNTOTAL LNMARKT MBASSVALUE LNFAGE WFL i.YEAR if e(sample)
                You will get an R2 of 1 (or nearly so if there is some rounding error) and the coefficient table will show you the exact colinearity relationship among the variables. Then in order to estimate COCODE effects you will have to eliminate one or more of those other variables from the model.

                Added: In the future, to get the output to show up in a more readable form, use code delimiters, as I have done here. See FAQ #12 for instructions.

                Comment


                • #9
                  Dear Mr Schechter,

                  Thanks a lot for your explanations, def will come back with questions soon.

                  Thanks again,
                  Hamza

                  Comment


                  • #10
                    Hello again,

                    I just have few concerns about time dummies in dynamic.Sys.GMM regression. When running the regression with time effect included as (YEAR*) it works fine, see below results:


                    Code:
                    . xtabond2 lnQ l.lnQ LNBSIZE BOIND DUALITY FAM_OWN INST1 OWNTOTAL LNMARKT MBASSVALUE LNFAGE WFL YEAR*, gmm(lnQ LNBSIZE BOIND DUALITY FAM_OWN INST1 OWNTOTAL LNMARKT MBASSVALUE WF
                    > L, lag(2 .) collapse) iv(LNFAGE YEAR*) twostep robust small
                    Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
                    Warning: Two-step estimated covariance matrix of moments is singular.
                      Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
                      Difference-in-Sargan/Hansen statistics may be negative.
                    
                    Dynamic panel-data estimation, two-step system GMM
                    ------------------------------------------------------------------------------
                    Group variable: CODE                            Number of obs      =       894
                    Time variable : YEAR                            Number of groups   =       149
                    Number of instruments = 63                      Obs per group: min =         6
                    F(12, 148)    =     17.37                                      avg =      6.00
                    Prob > F      =     0.000                                      max =         6
                    ------------------------------------------------------------------------------
                                 |              Corrected
                             lnQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                             lnQ |
                             L1. |   .6138356   .1030117     5.96   0.000     .4102718    .8173993
                                 |
                         LNBSIZE |  -.0071889   .3617493    -0.02   0.984    -.7220498    .7076721
                           BOIND |  -.2442175   .4999466    -0.49   0.626    -1.232173    .7437382
                         DUALITY |   .3434033   .2278228     1.51   0.134    -.1068025    .7936091
                         FAM_OWN |   1.068202   1.345958     0.79   0.429    -1.591576     3.72798
                           INST1 |   .8048258   1.393649     0.58   0.564    -1.949196    3.558848
                        OWNTOTAL |  -1.460602   1.289311    -1.13   0.259    -4.008438    1.087233
                         LNMARKT |   .0822979    .056369     1.46   0.146    -.0290942    .1936899
                      MBASSVALUE |    .280334   .2415289     1.16   0.248    -.1969567    .7576246
                          LNFAGE |  -.0110551   .0389611    -0.28   0.777     -.088047    .0659368
                             WFL |  -.0008407   .0030974    -0.27   0.786    -.0069614    .0052801
                            YEAR |   .0167419   .0090129     1.86   0.065    -.0010686    .0345525
                           _cons |  -34.97389   17.63462    -1.98   0.049    -69.82205   -.1257294
                    ------------------------------------------------------------------------------
                    Instruments for first differences equation
                      Standard
                        D.(LNFAGE YEAR)
                      GMM-type (missing=0, separate instruments for each period unless collapsed)
                        L(2/6).(lnQ LNBSIZE BOIND DUALITY FAM_OWN INST1 OWNTOTAL LNMARKT
                        MBASSVALUE WFL) collapsed
                    Instruments for levels equation
                      Standard
                        LNFAGE YEAR
                        _cons
                      GMM-type (missing=0, separate instruments for each period unless collapsed)
                        DL.(lnQ LNBSIZE BOIND DUALITY FAM_OWN INST1 OWNTOTAL LNMARKT MBASSVALUE
                        WFL) collapsed
                    ------------------------------------------------------------------------------
                    Arellano-Bond test for AR(1) in first differences: z =  -4.27  Pr > z =  0.000
                    Arellano-Bond test for AR(2) in first differences: z =  -1.68  Pr > z =  0.093
                    ------------------------------------------------------------------------------
                    Sargan test of overid. restrictions: chi2(50)   =  85.08  Prob > chi2 =  0.001
                      (Not robust, but not weakened by many instruments.)
                    Hansen test of overid. restrictions: chi2(50)   =  54.08  Prob > chi2 =  0.321
                      (Robust, but weakened by many instruments.)
                    
                    Difference-in-Hansen tests of exogeneity of instrument subsets:
                      GMM instruments for levels
                        Hansen test excluding group:     chi2(40)   =  27.83  Prob > chi2 =  0.927
                        Difference (null H = exogenous): chi2(10)   =  26.25  Prob > chi2 =  0.003
                      iv(LNFAGE YEAR)
                        Hansen test excluding group:     chi2(48)   =  54.01  Prob > chi2 =  0.256
                        Difference (null H = exogenous): chi2(2)    =   0.07  Prob > chi2 =  0.964

                    However, when i include time effect as (i.YEAR), the results showing a miserable failure by omitting the constant term!!. See below results

                    Code:
                    . xtabond2 lnQ l.lnQ LNBSIZE BOIND DUALITY FAM_OWN INST1 OWNTOTAL LNMARKT MBASSVALUE LNFAGE WFL i.YEAR, gmm(lnQ LNBSIZE BOIND DUALITY FAM_OWN INST1 OWNTOTAL LNMARKT MBASSVALUE W
                    > FL, lag(2 .) collapse) iv(LNFAGE i.YEAR) twostep robust small
                    Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
                    Warning: Two-step estimated covariance matrix of moments is singular.
                      Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
                      Difference-in-Sargan/Hansen statistics may be negative.
                    
                    Dynamic panel-data estimation, two-step system GMM
                    ------------------------------------------------------------------------------
                    Group variable: CODE                            Number of obs      =       894
                    Time variable : YEAR                            Number of groups   =       149
                    Number of instruments = 67                      Obs per group: min =         6
                    F(18, 148)    =     11.26                                      avg =      6.00
                    Prob > F      =     0.000                                      max =         6
                    ------------------------------------------------------------------------------
                                 |              Corrected
                             lnQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                             lnQ |
                             L1. |   .4766565   .1110716     4.29   0.000     .2571653    .6961477
                                 |
                         LNBSIZE |  -.2497412   .3646654    -0.68   0.495    -.9703647    .4708823
                           BOIND |   .0580443   .4158405     0.14   0.889    -.7637075     .879796
                         DUALITY |   .2444564   .2368049     1.03   0.304    -.2234992     .712412
                         FAM_OWN |   2.063371   1.743092     1.18   0.238    -1.381193    5.507935
                           INST1 |   1.766785   1.850969     0.95   0.341    -1.890957    5.424527
                        OWNTOTAL |  -2.372023    1.78973    -1.33   0.187    -5.908748    1.164702
                         LNMARKT |   .1233874   .0532114     2.32   0.022     .0182352    .2285395
                      MBASSVALUE |   .3895896    .237376     1.64   0.103    -.0794944    .8586737
                          LNFAGE |   .0154905   .0449326     0.34   0.731    -.0733018    .1042828
                             WFL |  -.0011445   .0033792    -0.34   0.735    -.0078221    .0055332
                                 |
                            YEAR |
                           2008  |          0  (empty)
                           2009  |  -1.909702   1.172541    -1.63   0.106    -4.226787    .4073828
                           2010  |  -1.901027   1.176295    -1.62   0.108    -4.225531    .4234767
                           2011  |  -1.937882   1.176811    -1.65   0.102    -4.263404    .3876399
                           2012  |  -1.935647   1.179444    -1.64   0.103    -4.266372    .3950785
                           2013  |  -1.906047   1.182309    -1.61   0.109    -4.242434    .4303403
                           2014  |  -1.893354   1.184472    -1.60   0.112    -4.234015    .4473067
                                 |
                           _cons |          0  (omitted)
                    ------------------------------------------------------------------------------
                    Instruments for first differences equation
                      Standard
                        D.(LNFAGE 2008b.YEAR 2009.YEAR 2010.YEAR 2011.YEAR 2012.YEAR 2013.YEAR
                        2014.YEAR)
                      GMM-type (missing=0, separate instruments for each period unless collapsed)
                        L(2/6).(lnQ LNBSIZE BOIND DUALITY FAM_OWN INST1 OWNTOTAL LNMARKT
                        MBASSVALUE WFL) collapsed
                    Instruments for levels equation
                      Standard
                        LNFAGE 2008b.YEAR 2009.YEAR 2010.YEAR 2011.YEAR 2012.YEAR 2013.YEAR
                        2014.YEAR
                        _cons
                      GMM-type (missing=0, separate instruments for each period unless collapsed)
                        DL.(lnQ LNBSIZE BOIND DUALITY FAM_OWN INST1 OWNTOTAL LNMARKT MBASSVALUE
                        WFL) collapsed
                    ------------------------------------------------------------------------------
                    Arellano-Bond test for AR(1) in first differences: z =  -3.57  Pr > z =  0.000
                    Arellano-Bond test for AR(2) in first differences: z =  -1.45  Pr > z =  0.146
                    ------------------------------------------------------------------------------
                    Sargan test of overid. restrictions: chi2(48)   = 102.36  Prob > chi2 =  0.000
                      (Not robust, but not weakened by many instruments.)
                    Hansen test of overid. restrictions: chi2(48)   =  61.32  Prob > chi2 =  0.094
                      (Robust, but weakened by many instruments.)
                    
                    Difference-in-Hansen tests of exogeneity of instrument subsets:
                      GMM instruments for levels
                        Hansen test excluding group:     chi2(38)   =  36.78  Prob > chi2 =  0.526
                        Difference (null H = exogenous): chi2(10)   =  24.54  Prob > chi2 =  0.006
                      iv(LNFAGE 2008b.YEAR 2009.YEAR 2010.YEAR 2011.YEAR 2012.YEAR 2013.YEAR 2014.YEAR)
                        Hansen test excluding group:     chi2(42)   =  48.13  Prob > chi2 =  0.239
                        Difference (null H = exogenous): chi2(6)    =  13.19  Prob > chi2 =  0.040

                    I am not sure about the difference between (* and i.), but it apparently show that (*) means average or aggregate effect rather than each year's effect.!!
                    Would appreciate your help.

                    Thanks,
                    Hamza

                    Comment


                    • #11
                      The wildcard YEAR* means find all variables in the data set that begin with the characters Y E A R and include them in the model. If you had separate indicator variables YEAR2008, YEAR2009, etc. they would all have been included. But you apparently have only a single variable, YEAR, which takes on the values 2008 through 2014. So it enters that one variable YEAR. Since you haven't specified i. in front of it (in the first model), Stata treats it as a continuous variable. So you get an estimate of the linear trend across all years.

                      In the second model you use i.YEAR. That is factor-variable notation and it tells Stata to create "virtual" indicator variables for each year (except 2008, the base year) and include those in the model. I cannot give you a full explanation of why the constant term ends up omitted. I am unfamiliar with GMM and with the -xtabond2- command, so it may have something to do with the details of that. Suffice it to say that when Stata omits variables it is usually due to collinearity. So I am inclined to think that there is some other variable you have included that is constant across all years. I wouldn't be able to tell you which variable that is, but the name FAM_OWN, if it stands for family-owned, sounds like a variable that doesn't change over time. So that may be the source of the problem.

                      Comment


                      • #12
                        Thanks Mr Clyde for usual help and support

                        Comment


                        • #13
                          Hello again,

                          I have got below results with regards to the omitted country dummy variable as suggested:
                          Code:
                           regress dum1 l.lnQ LNBSIZE BOIND DUALITY INST1 FAM_OWN OWNTOTAL LNMARKT MBASSVALUE LNFAGE WFL i.YEAR if e(sample)
                          
                                Source |       SS           df       MS      Number of obs   =       894
                          -------------+----------------------------------   F(16, 877)      =    108.06
                                 Model |  114.613971        16  7.16337317   Prob > F        =    0.0000
                              Residual |  58.1377071       877   .06629157   R-squared       =    0.6635
                          -------------+----------------------------------   Adj R-squared   =    0.6573
                                 Total |  172.751678       893  .193450927   Root MSE        =    .25747
                          
                          ------------------------------------------------------------------------------
                                  dum1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                   lnQ |
                                   L1. |   .1135758   .0202097     5.62   0.000     .0739107    .1532408
                                       |
                               LNBSIZE |   .3511402   .0351983     9.98   0.000     .2820575    .4202228
                                 BOIND |  -.2436526   .0559398    -4.36   0.000    -.3534441    -.133861
                               DUALITY |   .1114259   .0254227     4.38   0.000     .0615295    .1613223
                                 INST1 |   .1865776   .0890052     2.10   0.036     .0118895    .3612657
                               FAM_OWN |   .1036181    .091132     1.14   0.256    -.0752441    .2824803
                              OWNTOTAL |  -.0040236   .0904395    -0.04   0.965    -.1815268    .1734795
                               LNMARKT |   -.103496   .0059488   -17.40   0.000    -.1151715   -.0918205
                            MBASSVALUE |   .2716773   .0399339     6.80   0.000     .1933001    .3500544
                                LNFAGE |   -.066425   .0119528    -5.56   0.000    -.0898843   -.0429656
                                   WFL |  -.0126988   .0007206   -17.62   0.000    -.0141132   -.0112844
                                       |
                                  YEAR |
                                 2010  |    .019432   .0300208     0.65   0.518    -.0394889     .078353
                                 2011  |     .02315   .0302352     0.77   0.444    -.0361919    .0824918
                                 2012  |   .0559194   .0306014     1.83   0.068     -.004141    .1159799
                                 2013  |   .0713971   .0308245     2.32   0.021     .0108988    .1318955
                                 2014  |   .0750194   .0308111     2.43   0.015     .0145473    .1354916
                                       |
                                 _cons |   1.737582   .1342479    12.94   0.000     1.474097    2.001066
                          ------------------------------------------------------------------------------
                          Is it the OWNTOTAL variable that is highly correlated with this dummy? if its so, then cant simply delete it from the model as its a key element here!

                          Comment


                          • #14
                            Hello @Hamza Almustafa, I think we have a similar result with the time dummies in GMM. It appears that one of the time Dummies is "omitted" just like the constant, but when I eliminated that time Dummie the constant no longer appeared with the error, my question is: Is there supposed to be some kind of collinearity with the constant that makes that when it eliminates it, there is no error? Or is it wrong to eliminate one of the time dummies?.

                            I ask this because when I make this estimate with GMM for a specific set and not as before they were all, it appears to me that the "omitted" are now two years different from the previous one.

                            Comment


                            • #15
                              . xtreg (AB9 Dne9* Dnr9* Dpe9* Dnr9*)
                              note: Dnr9 omitted because of collinearity

                              Random-effects GLS regression Number of obs = 342
                              Group variable: ID Number of groups = 2

                              R-sq: Obs per group:
                              within = 0.0000 min = 171
                              between = 0.0000 avg = 171.0
                              overall = 0.0177 max = 171

                              Wald chi2(3) = 6.09
                              corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.1072

                              ------------------------------------------------------------------------------
                              AB9 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                              -------------+----------------------------------------------------------------
                              Dne9 | -1.556464 1.428468 -1.09 0.276 -4.35621 1.243281
                              Dnr9 | -.5548192 1.156174 -0.48 0.631 -2.820879 1.711241
                              Dpe9 | -1.080229 .5846352 -1.85 0.065 -2.226093 .065635
                              Dnr9 | 0 (omitted)
                              _cons | .006164 .0184596 0.33 0.738 -.0300162 .0423442
                              -------------+----------------------------------------------------------------
                              sigma_u | 0
                              sigma_e | .25039348
                              rho | 0 (fraction of variance due to u_i)
                              ------------------------------------------------------------------------------

                              how to remove constant here to avoid dummy trap?

                              Comment

                              Working...
                              X