Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed/Random effects regression: Omitted explanatory variable (Dummy) and setting of control variables

    Dear Statalist community,

    I have a problem regarding my bond panel. I’d like to run a fixed and random effects regression on the bond panel, with “YieldatIssuance” as the dependent variable and the Dummy “Green” as explanatory variable. I would also like to control the fixed effects of the following variables: Ticker (stands for the Issuer), Currency, PaymentRank, Maturity, AmountIssue, YearMonth(Issue Date), Rating. I converted the string variables into numeric variables using the encode command and dropped all variables I don’t need as controls from the panel.

    However, whenever I run the panel regression the variable “Green” is omitted due to collinearity. Therefore, I ran a regression of Green with every control variable of my panel to check for high R-squared values as I have read in a thread of a similar problem. However, the R-squared of each regression was below 1%.

    That’s the point where I am irritated now. I could not find any high correlation among my explanatory and control variables. Note that the variable “Green” displays relatively few “true” observations compared to the observations that do not fall under that category (Green==1 has 828 observations vs. 50,106 observations, if Green=0). Do you guys have any alternative ideas that might cause this collinearity? And do you have any suggestions to solve this problem, so I can use "Green" as explanatory variable? Any help will be highly appreciated!

    Thank you very much,
    Hans

    This is my code and output:
    Code:
    . duplicates drop ISIN_num, force
    
    Duplicates in terms of ISIN_num
    
    (36 observations deleted)
    
    . xtset ISIN_num IssueDate
           panel variable:  ISIN_num (weakly balanced)
            time variable:  IssueDate, 1/11/2007 to 12/6/2020
                    delta:  1 day
    
    . xtreg YieldatIssuance Green, fe
    note: Green omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =     50,934
    Group variable: ISIN_num                        Number of groups  =     50,934
    
    R-sq:                                           Obs per group:
         within  =      .                                         min =          1
         between =      .                                         avg =        1.0
         overall =      .                                         max =          1
    
                                                    F(0,0)            =       0.00
    corr(u_i, Xb)  =      .                         Prob > F          =          .
    
    ------------------------------------------------------------------------------
    YieldatIss~e |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           Green |          0  (omitted)
           _cons |   3.118114          .        .       .            .           .
    -------------+----------------------------------------------------------------
         sigma_u |  2.5257234
         sigma_e |          .
             rho |          .   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(50933, 0) = .                       Prob > F =      .
    
    . http://www.statalist.org/auth.key
    command http is unrecognized
    r(199);
    
    . type http://www.statalist.org/auth.key
    hx2AytymrtzbuzkbhomjtijpOslldynzgutsazy2
    
    . sum YieldatIssuance if Green==1
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    YieldatIss~e |        828    3.291529    2.469753       .001     12.875
    
    . sum YieldatIssuance if Green==0
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    YieldatIss~e |     50,106    3.115248    2.526562       .001         28

  • #2
    I think your Green variable does not change across time.

    You can check this with

    Code:
    xtsum Green

    Comment


    • #3
      Hans:
      as an aside to Joro's helpful comment, a panel data regression with one predictor only hardly gives a fair and true view of the data generating process you're intreseted in.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        To add to Joro's and Carlo's helpful comments, when you use fixed effects it controls for all variables that don't vary over time within panels, but this means you can't estimate parameters on such variables in the model. Since you are estimating a model on bonds, it is quite likely that Green is a characteristic of the bond that doesn't change over time.

        Note that there is a need to think theoretically about what whether what matters to your theory and model are factors that vary over time and/or stable features. Fixed effects for bonds throws away any variance associated with stable bond characteristics. If those are of importance, while random effects may let you estimate them, it mixes in some way both the within and between parameters which is problematic. The normal prescription to run them both and use a Hausman test is under the assumption that the true within and between parameters are equal. If this assumption is not correct, to test whether they are equal and then use the test to choose between fixed and random effects estimators makes little sense. In addition, with 50,000 observations, the supposed increase in efficiency from random effects is likely not needed (see Wooldridge's comments on this listserve).

        If bond characteristics that don't vary over time within bonds are important to your study, you might consider xthybrid or a Mundlak estimator that lets you estimate both within and between parameters

        Comment


        • #5
          Thank you very much for your responses, everyone!


          @Carlo Lazzaro:
          Right. However, I ran the regression using several predictor variables at the same time but all of them were omitted too. I guess this is because most of the variables don’t change over time as indicated in Phil’s comment. Nevertheless, this seems strange to me as I am trying to replicate a fixed effects regression of a recent publication based on the same “constant” bond characteristics.

          @Phil Bromiley:
          Thank you very much for your informative comment! I wanted to run both fixed and random effects model, however the random effects model always displayed an error message due to insufficient observations. So, there was no option to run a Hausman test or the xthybrid command so far. Is a potential mixing/confusion of the between and within parameters usually the closet explanation for this error?

          Fortunately, the Mundlak mixed effects model worked. However, I have little knowledge about the model and the quality of the regression output. To my understanding, the coefficients of this regression seem to be significant (P > z below 0.05, hence significant on a 5% level) but what about any equivalent R-squared/goodness-of-fit measure? I suppose it is a Chi-squared measure in this case but how do I interpret it? What number onwards/downwards gives an indication of a proper regression?

          Thank you guys again,
          Hans

          This is the output of the mixed effects GLM:
          Code:
          . meglm YieldatIssuance Green
          
          Iteration 0:   log likelihood = -119463.44  
          Iteration 1:   log likelihood = -119463.44  
          
          Mixed-effects GLM                               Number of obs     =     50,935
          Family:                Gaussian
          Link:                  identity
          
                                                          Wald chi2(1)      =       3.97
          Log likelihood = -119463.44                     Prob > chi2       =     0.0463
          -------------------------------------------------------------------------------
          YieldatIssu~e |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          --------------+----------------------------------------------------------------
                  Green |   .1763042   .0884922     1.99   0.046     .0028626    .3497457
                  _cons |   3.115225   .0112827   276.11   0.000     3.093111    3.137338
          --------------+----------------------------------------------------------------
          var(e.Yield~e)|   6.378559   .0399696                      6.300699    6.457381
          -------------------------------------------------------------------------------
          
          .

          Comment


          • #6
            Hans:
            it is always difficult (and often unfesible) to replicate what others did in published articles.
            Some reasons: technical journals set a word-count that can hardly be exceeded (and authors have to reduce the length of the Methods section); Authors do not lie but do not say the whole truth either about Methods and Results; reviewers are not familiar with statitsics and cannot spot Authors' mistakes.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X