Between estimator for time-invariant dummy variables

Passant Aboubakr

Join Date: Jun 2018
Posts: 7

Between estimator for time-invariant dummy variables

11 Jul 2018, 12:28

I am currently working on panel data which consists of 36 industrial sectors over 5 years, n=180. I have 5 independent variables which are proxy variables for technological progress such as: R&D expenditures, number of patent applications...etc. The dependent variable is the annual number of employees. I have already run a fixed effect regression to estimate the effect of technological upgrading on employment growth. I am now interested in estimating the effect among different industrial sectors. So I categorized the 36 industrial sectors into: high-technology level sectors, medium-high, medium-low, and low-technology. I know that I cannot include time-invariant dummies in a fixed effect model so I have run a between estimator regression; however, I am not sure if that is correct or not and how can I explain the beta coefficients of the dummies. In this case, the baseline is high-technology level industries. Below are the codes and the output results.

Thanks so much in advance.

Code:

label define technology_level 1 "high_technology" 2 "medium_high_technology" 3 "medium_low_technology" 4 "low_technology"

.
.
.
. gen byte technology_level:technology_level=1 if inlist(id, 20, 29, 31, 32)
(160 missing values generated)

.
. replace technology_level=2 if inlist(id, 19, 21, 27, 28, 30)
(25 real changes made)

.
. replace technology_level=3 if inlist(id, 1, 2, 3, 4, 5, 18, 22, 23, 24, 25, 26, 33, 34, 35, 36)
(75 real changes made)

.
. replace technology_level=4 if inlist(id, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
(60 real changes made)

.
. xi: xtreg $ylist $xlist i.technology_level, be
i.technology_~l   _Itechnolog_1-4     (naturally coded; _Itechnolog_1 omitted)

Between regression (regression on group means)  Number of obs     =        180
Group variable: id                              Number of groups  =         36

R-sq:                                           Obs per group:
     within  = 0.0083                                         min =          5
     between = 0.7639                                         avg =        5.0
     overall = 0.6563                                         max =          5

                                                F(8,27)           =      10.92
sd(u_i + avg(e_i.))=   1195228                  Prob > F          =     0.0000

---------------------------------------------------------------------------------------------
           Employees_number |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
            RD_expenditures |   .7916889   .5450573     1.45   0.158    -.3266763    1.910054
                RD_projects |   104.9565   73.54601     1.43   0.165    -45.94746    255.8604
  New_products_expenditures |  -.5771447   .6376137    -0.91   0.373     -1.88542    .7311306
NumberofPatentApplicationsp |   22.05046   36.91679     0.60   0.555    -53.69654    97.79745
 NumberofInventionsinForcep |   25.06571   29.79686     0.84   0.408    -36.07239    86.20382
              _Itechnolog_2 |   384517.3     921242     0.42   0.680     -1505715     2274750
              _Itechnolog_3 |    2380854   985924.9     2.41   0.023     357902.9     4403804
              _Itechnolog_4 |    3031953    1032893     2.94   0.007     912631.1     5151274
                      _cons |   -1441437    1038321    -1.39   0.176     -3571895    689021.1
---------------------------------------------------------------------------------------------

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

11 Jul 2018, 15:46

These results are interpreted exactly as you would for any other regression. So, for example, given two sectors, one of which is high technology (technology 1) and the other is medium high technology (technology 2), the expected difference in Employees_number is 384517.3 higher in the latter than in the former.

By the way, unless you are using a pretty old version of Stata, the -xi:- is now obsolete. Use factor-variable notation (-help fvvarlist-) instead for automatic creation of indicator variables for categorical predictors in regressions. In this case, the translation to factor-variable notation of your -xtreg- command is simple: just leave off the -xi:- and you are done. There are several advantages to using factor-variable notation over -xi:-. The most important is that it gives you access to the -margins- command, which can be very helpful in understanding and interpreting your results. It also reduces memory burden. And the outputs in the regression tables are much better labeled. Yes, there are a few dusty corners of Stata where -xi:- must still be used because they don't support factor variable notation. But they are mostly older, one might say archaic, commands whose functions have been taken over by newer commands that do support factor-variable notation. There remain a few circumstances where only -xi:- will do, but they are fairly exotic. So I would suggest that you almost forget you ever knew about -xi-.
Comment

Passant Aboubakr

Join Date: Jun 2018
Posts: 7

12 Jul 2018, 04:18

Thank you so much sir for taking your time to answer my questions. I am still confused about the validity of using between-estimator for including the time-invariant dummies. For example, when I tried the random effects model, the results have dramatically changed with negative coefficients compared to the between-estimator results for the same reference group. I guess that is because the between-estimator disregards the time-series aspect. Hausman test result suggests that the fixed-effect model is more appropriate for my analysis. So I am not sure if I can include random effects regression for the categorical side analysis with the dummies. Would you please advise?

Code:

 xtreg $ylist $xlist i.technology_level

Random-effects GLS regression                   Number of obs     =        180
Group variable: id                              Number of groups  =         36

R-sq:                                           Obs per group:
     within  = 0.1010                                         min =          5
     between = 0.4620                                         avg =        5.0
     overall = 0.4593                                         max =          5

                                                Wald chi2(8)      =      49.51
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

---------------------------------------------------------------------------------------------
           Employees_number |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
            RD_expenditures |   .1895305   .0517726     3.66   0.000     .0880581     .291003
                RD_projects |   11.78499   13.44162     0.88   0.381    -14.56009    38.13007
  New_products_expenditures |   .0192373    .015619     1.23   0.218    -.0113755    .0498501
NumberofPatentApplicationsp |  -1.625637    8.05524    -0.20   0.840    -17.41362    14.16234
 NumberofInventionsinForcep |  -10.19379    3.05022    -3.34   0.001    -16.17211    -4.21547
                            |
           technology_level |
    medium_high_technology  |  -430647.4   890758.8    -0.48   0.629     -2176503     1315208
     medium_low_technology  |   -1554273   777894.2    -2.00   0.046     -3078917   -29628.15
            low_technology  |   -1237050   800976.9    -1.54   0.122     -2806936    332835.8
                            |
                      _cons |    3340668   717294.9     4.66   0.000      1934796     4746540
----------------------------+----------------------------------------------------------------
                    sigma_u |  1192718.4
                    sigma_e |  173108.12
                        rho |  .97936969   (fraction of variance due to u_i)
---------------------------------------------------------------------------------------------

Code:

 xtreg $ylist $xlist, fe

Fixed-effects (within) regression               Number of obs     =        180
Group variable: id                              Number of groups  =         36

R-sq:                                           Obs per group:
     within  = 0.1088                                         min =          5
     between = 0.5501                                         avg =        5.0
     overall = 0.5406                                         max =          5

                                                F(5,139)          =       3.39
corr(u_i, Xb)  = 0.6425                         Prob > F          =     0.0064

---------------------------------------------------------------------------------------------
           Employees_number |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
            RD_expenditures |   .1646092   .0486791     3.38   0.001      .068362    .2608564
                RD_projects |    2.87754   12.37096     0.23   0.816    -21.58206    27.33714
  New_products_expenditures |   .0135808   .0141508     0.96   0.339    -.0143979    .0415595
NumberofPatentApplicationsp |  -6.392016   7.569016    -0.84   0.400    -21.35731    8.573273
 NumberofInventionsinForcep |  -7.358692   2.827979    -2.60   0.010    -12.95011   -1.767275
                      _cons |    2421832   97441.51    24.85   0.000      2229172     2614491
----------------------------+----------------------------------------------------------------
                    sigma_u |  1905992.8
                    sigma_e |  173108.12
                        rho |  .99181866   (fraction of variance due to u_i)
---------------------------------------------------------------------------------------------
F test that all u_i=0: F(35, 139) = 268.14                   Prob > F = 0.0000

Code:

hausman fe re

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |       fe           re         Difference          S.E.
-------------+----------------------------------------------------------------
RD_expendi~s |    .1646092     .1907254       -.0261162               .
 RD_projects |     2.87754     15.99828       -13.12074               .
New_produc~s |    .0135808     .0197953       -.0062146               .
NumberofPa~p |   -6.392016    -1.623678       -4.768338               .
NumberofIn~p |   -7.358692    -10.32092        2.962231               .
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(5) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       73.54
                Prob>chi2 =      0.0000
                (V_b-V_B is not positive definite)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

12 Jul 2018, 11:58

There is no reason to expect the random-effects and between-effects models to produce similar results. The random-effects output shows coefficients that represent a weighted average of the within- and between-group effects of the variables. The between-effects model produces strictly between-group effect estimates. If the within- and between- group effects of the variables are very different, then the -re- and -be- outputs will also be very different. Your data are evidently an instance of this happening.

It is also worth noting that the Hausman test is not useful in your situation. You have attached importance to variables that vary between but not within id's, so that their effects cannot be estimated in a fixed effects model. Your Hausman test for -fe- vs -re- must be carried out only using -xtreg- commands that exclude those variables: Hausman can only assess models that contain the same variables, and these technology level variables cannot be part of the -fe- model. Regardless of what the Hausman test tells you about the other variables, it has nothing at all to say about these technology variables. Those can only be estimated with -re- or -be-.

In comparing the -re- and -be- outputs, the impression I take away is that these coefficients are both extremely large, and extremely imprecisely estimated (i.e. they have very wide confidence intervals). Also, despite the apparently large differences between the -re- and -be- coefficients, they are within each others' confidence limits. Another way of saying this is that actually your data provide very little information about these effects at all, and I would say that neither model is telling you much about them. I think your attempt to estimate these effects using this data is simply not viable.

I also note that in both the -fe- and -re- models, you end up with rho very near 1. That is, nearly all the outcome variation in your data is at the id level, with almost none going on within id's. That is quite unusual to see, especially in financial data, and it makes me wonder if your data may not be incorrect. Perhaps the very high between-id variation is being caused by some wild outliers? Anyway, I don't work in finance/economics, so my intuitions here are probably not a useful guide. But just from what I have seen here on Statalist, rho this high in financial data seems anomalous and raises my suspicions of bad data.
1 like
Comment

Passant Aboubakr

Join Date: Jun 2018
Posts: 7

12 Jul 2018, 13:25

Thank you so much for your reply and for pointing out the issue in the data. Regarding Hausman test, I apologize if I did not make myself clear. I ran the Hausman test for -fe- vs -re- without including the dummy variables for both times. I appreciate your advice regarding the non-viability of estimating these effects by using this data, and I will consider qualitative alternatives instead.

With regard to the between-id variation, your intuition is totally right. I have a very high between-id variation; however, I have some explanations for this. One of the reasons is that many sectors incur profit losses and other do not, so both the dependent variable and the independent variables change between ids more than within based on similar reasons.

Code:

xtsum $ylist $xlist

Variable         |      Mean   Std. Dev.       Min        Max |    Observations
-----------------+--------------------------------------------+----------------
Employ~r overall |   2696014    2142379     208900    9092600 |     N =     180
         between |              2160554   213586.7    8897066 |     n =      36
         within  |             161584.7    1937892    3301097 |     T =       5
                 |                                            |
RD_exp~s overall |   2532705    3471377    20007.9   1.81e+07 |     N =     180
         between |              3440152   50372.38   1.43e+07 |     n =      36
         within  |             693015.8   -1085074    6377738 |     T =       5
                 |                                            |
RD_pro~s overall |  8987.428   10969.55         85      41883 |     N =     180
         between |             11013.16      150.8    37705.8 |     n =      36
         within  |             1323.558   3156.628   13164.63 |     T =       5
                 |                                            |
New_pr~s overall |   2723652    4131107    16521.6   2.35e+07 |     N =     180
         between |              4009852   33549.42   1.81e+07 |     n =      36
         within  |              1160363   -4489867   1.02e+07 |     T =       5
                 |                                            |
Numbe~sp overall |  16815.76   24457.34         84     118725 |     N =     180
         between |             24395.68      169.2      98876 |     n =      36
         within  |             4038.771   345.7556   39570.96 |     T =       5
                 |                                            |
Numbe~ep overall |  13329.59   27536.02         49     227365 |     N =     180
         between |             25708.74        131   141164.6 |     n =      36
         within  |             10585.96  -44246.01   99529.99 |     T =       5

I do not have a good econometric background. But since you pointed out this issue to me, I am now wondering if it is correct to use the fixed-effect model for the main regression analysis (without including time-invariant dummies). As far as I understand, the fixed effect estimator uses the within variation for each id.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

12 Jul 2018, 13:34

As far as I understand, the fixed effect estimator uses the within variation for each id.

That is correct. It does not invalidate the use of the fixed effects model, but it does mean that you are modeling a tiny piece of the total variation and ignoring the much larger variation going on between groups. Now, depending on your specific research questions that may be just fine. You need to consider your research goals and whether an evaluation of the tiny within-groups variation is advances you toward those goals.
Comment
Passant Aboubakr

Join Date: Jun 2018

Posts: 7
#7

12 Jul 2018, 14:53

Thank you so much, your answers have been really helpful. I will also consider fixed effects by using LSDV to control for the unobserved heterogeneity across the industrial sectors.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2593
#8

13 Jul 2018, 03:53

In addition to Clyde's excellent explanations, you might also want to look at the following Stata Journal article:
Schunck, R. (2013). Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models. Stata Journal 13 (1), 65-76

https://www.kripfganz.de/stata/
1 like
Comment
Passant Aboubakr

Join Date: Jun 2018

Posts: 7
#9

15 Jul 2018, 16:23

Thank you so much Sebastian for the recommendation.
Comment

Announcement

Between estimator for time-invariant dummy variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment