Fixed/Random effects regression: Omitted explanatory variable (Dummy) and setting of control variables

Hans Koslowski

Join Date: Feb 2020
Posts: 12

Fixed/Random effects regression: Omitted explanatory variable (Dummy) and setting of control variables

24 Sep 2020, 14:37

Dear Statalist community,

I have a problem regarding my bond panel. I’d like to run a fixed and random effects regression on the bond panel, with “YieldatIssuance” as the dependent variable and the Dummy “Green” as explanatory variable. I would also like to control the fixed effects of the following variables: Ticker (stands for the Issuer), Currency, PaymentRank, Maturity, AmountIssue, YearMonth(Issue Date), Rating. I converted the string variables into numeric variables using the encode command and dropped all variables I don’t need as controls from the panel.

However, whenever I run the panel regression the variable “Green” is omitted due to collinearity. Therefore, I ran a regression of Green with every control variable of my panel to check for high R-squared values as I have read in a thread of a similar problem. However, the R-squared of each regression was below 1%.

That’s the point where I am irritated now. I could not find any high correlation among my explanatory and control variables. Note that the variable “Green” displays relatively few “true” observations compared to the observations that do not fall under that category (Green==1 has 828 observations vs. 50,106 observations, if Green=0). Do you guys have any alternative ideas that might cause this collinearity? And do you have any suggestions to solve this problem, so I can use "Green" as explanatory variable? Any help will be highly appreciated!

Thank you very much,
Hans

This is my code and output:

Code:

. duplicates drop ISIN_num, force

Duplicates in terms of ISIN_num

(36 observations deleted)

. xtset ISIN_num IssueDate
       panel variable:  ISIN_num (weakly balanced)
        time variable:  IssueDate, 1/11/2007 to 12/6/2020
                delta:  1 day

. xtreg YieldatIssuance Green, fe
note: Green omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     50,934
Group variable: ISIN_num                        Number of groups  =     50,934

R-sq:                                           Obs per group:
     within  =      .                                         min =          1
     between =      .                                         avg =        1.0
     overall =      .                                         max =          1

                                                F(0,0)            =       0.00
corr(u_i, Xb)  =      .                         Prob > F          =          .

------------------------------------------------------------------------------
YieldatIss~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       Green |          0  (omitted)
       _cons |   3.118114          .        .       .            .           .
-------------+----------------------------------------------------------------
     sigma_u |  2.5257234
     sigma_e |          .
         rho |          .   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(50933, 0) = .                       Prob > F =      .

. http://www.statalist.org/auth.key
command http is unrecognized
r(199);

. type http://www.statalist.org/auth.key
hx2AytymrtzbuzkbhomjtijpOslldynzgutsazy2

. sum YieldatIssuance if Green==1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
YieldatIss~e |        828    3.291529    2.469753       .001     12.875

. sum YieldatIssuance if Green==0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
YieldatIss~e |     50,106    3.115248    2.526562       .001         28

Tags: None

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

24 Sep 2020, 14:47

I think your Green variable does not change across time.

You can check this with

Code:

xtsum Green
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#3

25 Sep 2020, 00:56

Hans:
as an aside to Joro's helpful comment, a panel data regression with one predictor only hardly gives a fair and true view of the data generating process you're intreseted in.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

25 Sep 2020, 11:35

To add to Joro's and Carlo's helpful comments, when you use fixed effects it controls for all variables that don't vary over time within panels, but this means you can't estimate parameters on such variables in the model. Since you are estimating a model on bonds, it is quite likely that Green is a characteristic of the bond that doesn't change over time.

Note that there is a need to think theoretically about what whether what matters to your theory and model are factors that vary over time and/or stable features. Fixed effects for bonds throws away any variance associated with stable bond characteristics. If those are of importance, while random effects may let you estimate them, it mixes in some way both the within and between parameters which is problematic. The normal prescription to run them both and use a Hausman test is under the assumption that the true within and between parameters are equal. If this assumption is not correct, to test whether they are equal and then use the test to choose between fixed and random effects estimators makes little sense. In addition, with 50,000 observations, the supposed increase in efficiency from random effects is likely not needed (see Wooldridge's comments on this listserve).

If bond characteristics that don't vary over time within bonds are important to your study, you might consider xthybrid or a Mundlak estimator that lets you estimate both within and between parameters
1 like
Comment
Hans Koslowski

Join Date: Feb 2020

Posts: 12
#5

28 Sep 2020, 07:36

Thank you very much for your responses, everyone!

@Carlo Lazzaro:
Right. However, I ran the regression using several predictor variables at the same time but all of them were omitted too. I guess this is because most of the variables don’t change over time as indicated in Phil’s comment. Nevertheless, this seems strange to me as I am trying to replicate a fixed effects regression of a recent publication based on the same “constant” bond characteristics.

@Phil Bromiley:
Thank you very much for your informative comment! I wanted to run both fixed and random effects model, however the random effects model always displayed an error message due to insufficient observations. So, there was no option to run a Hausman test or the xthybrid command so far. Is a potential mixing/confusion of the between and within parameters usually the closet explanation for this error?

Fortunately, the Mundlak mixed effects model worked. However, I have little knowledge about the model and the quality of the regression output. To my understanding, the coefficients of this regression seem to be significant (P > z below 0.05, hence significant on a 5% level) but what about any equivalent R-squared/goodness-of-fit measure? I suppose it is a Chi-squared measure in this case but how do I interpret it? What number onwards/downwards gives an indication of a proper regression?

Thank you guys again,
Hans

This is the output of the mixed effects GLM:

Code:

. meglm YieldatIssuance Green Iteration 0: log likelihood = -119463.44 Iteration 1: log likelihood = -119463.44 Mixed-effects GLM Number of obs = 50,935 Family: Gaussian Link: identity Wald chi2(1) = 3.97 Log likelihood = -119463.44 Prob > chi2 = 0.0463 ------------------------------------------------------------------------------- YieldatIssu~e | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- Green | .1763042 .0884922 1.99 0.046 .0028626 .3497457 _cons | 3.115225 .0112827 276.11 0.000 3.093111 3.137338 --------------+---------------------------------------------------------------- var(e.Yield~e)| 6.378559 .0399696 6.300699 6.457381 ------------------------------------------------------------------------------- .
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

28 Sep 2020, 07:47

Hans:
it is always difficult (and often unfesible) to replicate what others did in published articles.
Some reasons: technical journals set a word-count that can hardly be exceeded (and authors have to reduce the length of the Methods section); Authors do not lie but do not say the whole truth either about Methods and Results; reviewers are not familiar with statitsics and cannot spot Authors' mistakes.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement

Fixed/Random effects regression: Omitted explanatory variable (Dummy) and setting of control variables

Comment

Comment

Comment

Comment

Comment