Omitted variables when running between effects on Panel data

Axel Grapengiesser

Join Date: May 2015

Posts: 5
#1

Omitted variables when running between effects on Panel data

12 May 2015, 09:16

Hi,

This is my first post on the statlist forum. Or actually my second as I quickly realized that the sendbox was maybe not the perfect place to post this...

I'm currently researching the performance of mutual SRI funds in Sweden and I was planning on regressing my cross sectional time series data as panel data using xtreg in STATA. I've this problem stated below and I truly hope that someone more experienced can help me or point me in the right direction. I've been searching for answers and trying everything I can think of for more then 10h now and my head is spinning :/.

I have a dataset containing about 140 conventional funds and 75 SRI funds that I want to analyze over a time period of 17 observations for each fund (monthly returns). I would like to regress the data on the fama french factors (MktRF, SMB, HML and MOM).

I'm running the command:
xtreg return MktRF SMB HML MOM, re/fe/be

In order to determine if I should use fixed or random effects I'm performing a Hausman-test. For the conventional funds (140) this all seems to work good, but for the SRI funds (75) this doesn't work as the fixed and random effect gives the exact same coefficients. After some investigation and after running a between test, I've realized that STATA omitts the MktRF, SMB, HML and MOM variables with the explanation that they are collinear when running the code for the SRI funds. I've tried using different indices as MktRF but always end up with the same result, I've also tried to analyze the correlation using xtcorr and xtserial but can't find my issue. All funds of course have the same data for each fund for the MktRF, SMB, HML and MOM, however this doesn't seem to be a problem when running the code on conventional funds?

I'm truely gratefull for all help you might have for me on this!
Best Regards
Axel Grapengiesser
Stockholm School of Economics, Sweden
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#2

12 May 2015, 09:25

Consider that the between estimator will look at deviations (of a particular variable) between cross-sectional units. If your variables take the same value at a particular point in time (e.g. the price of crude, interest rates, etc.), it is natural that they will be omitted since they do not vary between cross-sectional units.
Comment
Axel Grapengiesser

Join Date: May 2015

Posts: 5
#3

12 May 2015, 09:42

Does that mean that running a Random Effects panel regression is useless for this kind of data? My idea why I wanted to run a panel regression was to account for the fund-specific factors affecting the result of the funds but that I'm not able to control for... Is my thinking wrong here, should I just run a OLS regression on the whole fundset?

I can't understand why the same result is not obtained for the conventional funds (the larger panel dataset containing 140 funds), as MktRF, SMB, HML etc are constant for all funds for each point in time in that set too...

Thanks!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#4

12 May 2015, 10:43

How is your model specified? Is it that only your outcome variable varies over time and across funds whereas your regressors only vary across time? Or do you have time-varying and fund varying regressors as well? Check whether this is consistent across both panels, i.e., the model that you run.

In any case, since you have variation over time, you will be able to run a panel regression. I can illustrate how once I have confirmation as to the nature of your variables.

Last edited by Andrew Musau; 12 May 2015, 10:58.
Comment
Axel Grapengiesser

Join Date: May 2015

Posts: 5
#5

12 May 2015, 11:08

It is only the outcome variable (return) that varies over time and across funds, all the other variables (independant) are constant across funds but vary across time. This is consistent for both panels!

I deleted all the funds that started or ended during that time period and by doing so I removed all funds not having data for the entire time serie (which all the SRI funds did have). By doing so STATA omitted the variables for the conventional fundsset aswell, so my problem is now a little bit more consistent...

Thank you for your time!

Last edited by Axel Grapengiesser; 12 May 2015, 11:28.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

12 May 2015, 12:37

Axel: In such a case, OLS and random effects are equivalent

Code:

reg return MktRF SMB HML MOM
xtreg return MktRF SMB HML MOM,re

and FE will also give you the same coefficient estimates (since there are no fund fixed effects). You will however have time effects. To the broader question, you want to look at the effect of a set of regressors on returns at the fund level, but the regressors do not vary across funds. Thus, to what extent can we attribute the difference in returns in fund A over fund B if what I predict affects these returns is the same across both funds? Surely, you need something else to tell you why these returns differ. I think that you need to re-examine your research question.

For the mechanics of estimation, the following example will do: My outcome variable return varies across funds and over time, but my regressors only vary over time.

Code:

input year fund_id return rate rate2
1 1 .99 .25 -.3
2 1 .12 .35 -.2
3 1 .18 .45 .05
4 1 -.61 .55 .5
5 1 .36 .1 .8
1 2 .92 .25 -.3
2 2 .35 .35 -.2
3 2 .3 .45 .05
4 2 -.71 .55 .5
5 2 -.5 .1 .8
1 3 .88 .25 -.3
2 3 .7 .35 -.2
3 3 -.4 .45 .05
4 3 -.57 .55 .5
5 3 .8 .1 .8
1 4 .79 .25 -.3
2 4 .3 .35 -.2
3 4 -.2 .45 .05
4 4 -.72 .55 .5
5 4 .3 .1 .8
end
    

. xtset fund_id year
       panel variable:  fund_id (strongly balanced)
        time variable:  year, 1 to 5
                delta:  1 unit

. reg  return rate rate2

      Source |       SS       df       MS              Number of obs =      20
-------------+------------------------------           F(  2,    17) =   29.22
       Model |  5.07806304     2  2.53903152           Prob > F      =  0.0000
    Residual |  1.47741707    17  .086906887           R-squared     =  0.7746
-------------+------------------------------           Adj R-squared =  0.7481
       Total |  6.55548011    19  .345025269           Root MSE      =   .2948

------------------------------------------------------------------------------
      return |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        rate |  -2.657173    .428437    -6.20   0.000    -3.561096    -1.75325
       rate2 |  -.8738955   .1597054    -5.47   0.000    -1.210844   -.5369465
       _cons |   1.216001   .1663348     7.31   0.000     .8650652    1.566937
------------------------------------------------------------------------------

. xtreg  return rate rate2, re

Random-effects GLS regression                   Number of obs      =        20
Group variable: fund_id                         Number of groups   =         4

R-sq:  within  = 0.0000                         Obs per group: min =         5
       between = 0.0000                                        avg =       5.0
       overall = 0.7746                                        max =         5

                                                Wald chi2(2)       =     58.43
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      return |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        rate |  -2.657173    .428437    -6.20   0.000    -3.496894   -1.817452
       rate2 |  -.8738955   .1597054    -5.47   0.000    -1.186912   -.5608787
       _cons |   1.216001   .1663348     7.31   0.000     .8899907    1.542011
-------------+----------------------------------------------------------------
     sigma_u |          0
     sigma_e |  .30837096
         rho |          0   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg  return rate rate2, fe

Fixed-effects (within) regression               Number of obs      =        20
Group variable: fund_id                         Number of groups   =         4

R-sq:  within  = 0.7923                         Obs per group: min =         5
       between =      .                                        avg =       5.0
       overall = 0.7746                                        max =         5

                                                F(2,14)            =     26.70
corr(u_i, Xb)  = 0.0000                         Prob > F           =    0.0000

------------------------------------------------------------------------------
      return |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        rate |  -2.657173   .4481602    -5.93   0.000    -3.618381   -1.695965
       rate2 |  -.8738955   .1670575    -5.23   0.000    -1.232198   -.5155928
       _cons |   1.216001   .1739921     6.99   0.000      .842825    1.589177
-------------+----------------------------------------------------------------
     sigma_u |  .09869819
     sigma_e |  .30837096
         rho |  .09292152   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(3, 14) =     0.51               Prob > F = 0.6804

.

Last edited by Andrew Musau; 12 May 2015, 13:27.

Comment

Axel Grapengiesser

Join Date: May 2015

Posts: 5
#7

13 May 2015, 03:47

Hello Andrew, Thank you very much for your answer. Our methodology is not exhausting regarding the cause of differnces, for that, I realize that we would need fund specific regressors. The values that we're most interested in from the regressions are the constant (Jensen's Alpha) determining over/underperoformance compared to market for the different fund cathegories, and the R2 measure which tells us something about how well the market factors explain the fund return during the period. The SMB factor is also of interest to tell us something about the relevance of small cap vs large cap investment on return. I'm very gratefull for all your help and if you would like a copy of the thesis report once done, please sent me an email on [email protected] and I'll send you one. Thank you! Best Regards Axel
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#8

13 May 2015, 04:24

Thanks for the clarification. All the best!
Comment
Amanda Lira

Join Date: May 2015

Posts: 1
#9

16 May 2015, 08:14

Hello!
I'll allow myself to take advantage of this topic to ask for help with a somewhat similar problem I am having while trying to work with a simultaneous GMM estimation.

I have data from 1999 to 2013 from 135 US commercial banks and I am trying to work on a model similar to the one of Distinguin and Roulet(2013) on the relationship between Liquidity Creation and Regulatory Capital.

Since they work with the following system of equations:

lc = reg +endvarlc + exvarlc
reg = lc +endvarreg + exvarreg

lc = liquidity creation
reg = regulatory capital
endvar (lc or reg) = a number of bank-level explanatory variables that are replaced by their 1-year lagged value to avoid endogeneity issues
exvar (lc or reg) = a number of exogenous explanatory macroeconomic variables

I've trying using xtivreg to get fe/re estimates and do an overidentification test with the following commands:
xtivreg lc reg lagmkpwr (reg lagmkpwr =intrate libinrate crisis gdpgwt), re
xtivreg lc reg lagmkpwr (reg lagmkpwr =intrate libinrate crisis gdpgwt), fe
xtoverid

However i get the o.operator error because as Andrew said, some of my variables take the same value at a particular point in time so the are naturally omitted.

That said, I have a few questions:

1.How do I know which variables are being omitted? If I cannot drop them from the model, what alternatives do I have?
2. The one year lagged value of the bank-level variables are supposed to avoid endogeinity issues, if that's right, should I still include those variables on the instrumented/endogenous list? (See the example of lagmktpw above.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#10

17 May 2015, 08:36

Amanda: Prior to running your model, you can identify variables that do not vary over banks by running the following regression (the regression itself need not make sense)

Code:

xtreg lc reg lagmkpwr intrate libinrate crisis gdpgwt, be

The non-bank varying variables will be omitted because of collinearity. Similarly, to identify variables that do not vary over time, run the usual fixed effects regression

Code:

xtreg lc reg lagmkpwr intrate libinrate crisis gdpgwt, fe

Distinguin and Roulet (2013) consider a cross-country analysis (European and American banks) whereas you just consider American banks. For example, in the regulatory capital equation, they consider the annual growth rate of real GDP (GDP_GWT) as a proxy of the economic environment. Therefore, this will differ between banks in Italy and France, or banks in the US and the UK (for example), so you have variation across banks. Similarly, in the liquidity equation, they consider each country's central bank policy rate (CB) as a proxy for monetary policy which will vary across countries (and hence banks). So, to answer your question, there are two options

If I cannot drop them from the model, what alternatives do I have?

1) You can look at the availability of state level data: Thus, you use the annual growth rate of real GDP in Maryland to proxy the rate for banks in Maryland, whereas you use the rate in Michigan to proxy Michigan banks.

2) If you are unable to obtain state level data for these variables, you can expand your sample to include banks outside the US so that you have cross-country variation. Distinguin and Roulet (2013) obtain most of their data from Bloomberg, so this is publicly available data.

Secondly,

The one year lagged value of the bank-level variables are supposed to avoid endogeinity issues, if that's right, should I still include those variables on the instrumented/endogenous list? (See the example of lagmktpw above.

The one-year lags are the instruments for the endogenous bank-level explanatory variables (so the levels are instrumented by the one-year lag). As they do, you should check whether you can use more lags of the variables (in their case, higher order lags are not used because they are weak instruments). Finally, it appears that you are using instrumental variable regression whereas they used GMM. You can use IV regression only if there is no heteroskedasticity of the errors.
Comment

Announcement