Is there a reason why Stata doesn't allow robust standard errors or clustering for between-effects models ?

Ive Muller

Join Date: Apr 2023

Posts: 12
#1

Is there a reason why Stata doesn't allow robust standard errors or clustering for between-effects models ?

29 Apr 2023, 10:30

I am trying to estimate a between effects model with the following command :

Code:

xtreg Y X, be

When I try to add the option for robust standard errors, Stata tells me that this option is not allowed. Is there a reason for this ?

By looking at a fitted vs. residuals plot of my regression, I seem to be experiencing heteroskedasticity, but if I cannot use robust standard errors, what can I do to take this into account ?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

29 Apr 2023, 10:33

Ive:
why not considering -xtreg,re-?

Kind regards,
Carlo
(Stata 19.0)
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10214

29 Apr 2023, 10:52

Originally posted by Ive Muller View Post

I am trying to estimate a between effects model with the following command :

Code:

xtreg Y X, be

When I try to add the option for robust standard errors, Stata tells me that this option is not allowed.

What is there to cluster on? The between model is a cross-sectional equation. Consider the general panel data model:

$$y_{it}= \beta^{\prime}x_{it}+ \gamma^{\prime}z_{i}+\eta_{i}+u_{it}\;\;\;(i=1,... , N; t=1,..., T)$$

where the $x$ variables are time-varying, the $z$ variables are time invariant and $\eta_{i}$ is the time-invariant individual effect. The between model is

$$\bar{y}_{i}= \beta^{\prime}\bar{x}_{i}+ \gamma^{\prime}z_{i}+\eta_{i}+\bar{u}_{i}$$

where $$\bar{y}_{i}=\frac{1}{T}\sum_{t=1}^{T}y_{it}, \;\;\bar{x}_{i}=\frac{1}{T}\sum_{t=1}^{T}x_{it}, \;\;\bar{u}_{i}=\frac{1}{T}\sum_{t=1}^{T}u_{it}.$$

In other words, you are just collapsing the data and running OLS. There are no clusters to talk about. But you may specify White standard errors using regress.

Code:

webuse grunfeld, clear
xtset company year
xtreg invest mvalue kstock, be

*COLLAPSE + OLS
collapse invest mvalue kstock, by(company)
regress invest mvalue kstock

Res.:

Code:

. xtreg invest mvalue kstock, be

Between regression (regression on group means)  Number of obs     =        200
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.4778                                         min =         20
     between = 0.8578                                         avg =       20.0
     overall = 0.7551                                         max =         20

                                                F(2,7)            =      21.11
sd(u_i + avg(e_i.))=  85.02366                  Prob > F          =     0.0011

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1346461   .0287455     4.68   0.002     .0666739    .2026183
      kstock |   .0320315   .1909378     0.17   0.872    -.4194647    .4835276
       _cons |  -8.527114   47.51531    -0.18   0.863     -120.883    103.8287
------------------------------------------------------------------------------

.
.
.
. *COLLAPSE + OLS

.
. collapse invest mvalue kstock, by(company)

.
. regress invest mvalue kstock

      Source |       SS           df       MS      Number of obs   =        10
-------------+----------------------------------   F(2, 7)         =     21.11
       Model |  305176.442         2  152588.221   Prob > F        =    0.0011
    Residual |  50603.1625         7  7229.02322   R-squared       =    0.8578
-------------+----------------------------------   Adj R-squared   =    0.8171
       Total |  355779.604         9  39531.0671   Root MSE        =    85.024

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1346461   .0287455     4.68   0.002     .0666739    .2026183
      kstock |   .0320315   .1909378     0.17   0.872    -.4194647    .4835276
       _cons |  -8.527115   47.51531    -0.18   0.863     -120.883    103.8287
------------------------------------------------------------------------------

.

On a broader point, the between estimator is useful in considering the random effects model rather than an estimator in its own right. On the other hand, the random effects estimator may not be consistent and you need to verify its assumption that the firm effects are uncorrelated with your covariates.

Last edited by Andrew Musau; 29 Apr 2023, 11:01.

Comment

Ive Muller

Join Date: Apr 2023

Posts: 12
#4

30 Apr 2023, 04:14

Carlo :

Originally posted by Carlo Lazzaro View Post

Ive:
why not considering -xtreg,re-?

To answer this, I need to talk about my model more precisely.

I am trying to find if World University Rankings (such as ARWU or QS) take into account academic freedom in some capacity. Thus, my dependant variable is the ranking obtained by a university, and my independant variable of interest is the Academic Freedom Index (AFI) by V-Dem.

The AFI is very stable over the years, only changing in distinct events. For this reason, there is little interest in taking into account the longitudinal aspect of my data. I even tried a fixed effects model with this data, but the obtained coefficients were very unstable, losing significance and changing signs when I introduced new control variables, which lead me to think that maybe a between estimator was more appropriate here. I thought it would be more interesting to use a between effect model and look if the universities in countries with the highest academic freedom also have the best rankings.

Last edited by Ive Muller; 30 Apr 2023, 04:21.
Comment
Ive Muller

Join Date: Apr 2023

Posts: 12
#5

30 Apr 2023, 04:20

Andrew :

Thank your for the example, I see how I can use robust standard errors now !

Also :

Originally posted by Andrew Musau View Post

On a broader point, the between estimator is useful in considering the random effects model rather than an estimator in its own right.

What do you mean by this ?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

30 Apr 2023, 05:25

Ive:
a more relevant issue here seems to rest on the fact that you have one predictor of interest only.
Therefore, I'd follow- Andrew's helpful guidance and check whether your regression is correctly specified via -linktest-.
As an aside, please note thet -be- estimator is rarely used in its own right and most of the research on short panel datasets relies on -xtreg,fe- and -xtreg,re-. As Andrew wisely highlighted, the main assumption of the -re- estimator (that is, the panel component of the error, named u, is uncorrelated with the vector of the regressors), may not hold.

Last edited by Carlo Lazzaro; 30 Apr 2023, 05:27.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10214
#7

01 May 2023, 13:06

Originally posted by Ive Muller View Post

What do you mean by this ?

"On a broader point, the between estimator is useful in considering the random effects model rather than an estimator in its own right"

You will need to do a bit of reading on panel data models. In summary, a major motivation for using panel data is the ability to control for time-invariant heterogeneity that is possibly correlated with the RHS variables. The between estimator averages the heterogeneity - this doesn't drop out of the equation in #3 - so it is not able to achieve the aforementioned goal. It is however useful in considering the random effects estimator as this estimator is a weighted combination of the within and between estimators.

Last edited by Andrew Musau; 01 May 2023, 13:11.
1 like
Comment

Announcement