Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting test results on panel data and how to correctly perform fixed effect regression on panel data

    Hi members,

    I have a large set of panel data with information about 166 bonds, containing some of their characteristics (such as currency, issue date, etc.) followed by daily yield data over a five year period for each bond (although most of these values are missing). I have about 54 000 observations of bond yields.

    My goal is to run a regression that shows what variables have an effect on the bond's yield. More specifically, I am trying to run a regression of the yield on a measure of the bond's liquidity to find the unobserved effect that isn't explained by the variable liquidity.

    So far I have run various tests to check whether I should use a fixed or random effects model, as well as tests to check for autocorrelation and heteroskedasticity, as well as an F-test. I am not sure if I am interpreting the results of these tests correctly and what my model choice should be going forward to perform regressions in stata.

    The regression I am trying to perform is: (Y is yield, P_i is the fixed-effect estimator, Liquidity is the variable for Liquidity). Yield has variable name YIELDDIFF and Liquidity is BIDASKSP

    Y_i,t = P_i+Liquidity_i,t +e_i,t with e being the error term.

    I have first run an F-test, with the following result:
    Code:
    xtreg YIELDDIFF BIDASKSP, fe
    
    Fixed-effects (within) regression               Number of obs     =     44,751
    Group variable: RIC_2                           Number of groups  =        166
    
    R-sq:                                           Obs per group:
         within  = 0.0485                                         min =         13
         between = 0.0059                                         avg =      269.6
         overall = 0.0504                                         max =      1,178
    
                                                    F(1,44584)        =    2272.59
    corr(u_i, Xb)  = 0.0180                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
       YIELDDIFF |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        BIDASKSP |  -.2839549   .0059565   -47.67   0.000    -.2956296   -.2722801
           _cons |   .0158317    .000306    51.74   0.000      .015232    .0164315
    -------------+----------------------------------------------------------------
         sigma_u |  .15915535
         sigma_e |  .06431106
             rho |   .8596394   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(165, 44584) = 1130.58               Prob > F = 0.0000
    1. Am I interpreting this test correctly as saying that my fixed-effect estimator has an explanatory value for the yield of the bonds, as given by u_i=0: F(165, 44584) = 1130.58 - and that I should in fact use a fixed effect model? What does the large F value mean?

    2. Is it correct to run xtreg with YIELDDIFF (i.e. yield) as the dependent variable and only BIDASKSP (Liquidity) as the independent variable to isolate the fixed-effect estimator Pi (as outlined in the equation above)?

    I then ran the Hausman test which I understand as indicating that I should be using a fixed effect rather than random effect model:
    Code:
    Test:  Ho:  difference in coefficients not systematic
    
                      chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                              =        3.95
                    Prob>chi2 =      0.0468
    Following that, I tested for Autocorrelation using a Wooldridge test, with the following result:
    Code:
    xtserial YIELDDIFF BIDASKSP
    
    Wooldridge test for autocorrelation in panel data
    H0: no first-order autocorrelation
        F(  1,     165) =     19.676
               Prob > F =      0.0000
    I also performed a Modified Wald test for heteroskedasticity
    Code:
    xttest3
    
    Modified Wald test for groupwise heteroskedasticity
    in fixed effect regression model
    
    H0: sigma(i)^2 = sigma^2 for all i
    
    chi2 (166)  =   1.1e+09
    Prob>chi2 =      0.0000
    3. Am I correct in interpreting these results as having both autocorrelation as well as heteroskedasticity in my data?

    As for going forward, my understanding is that I should be doing a regression with robust standard errors (After skimming through previous research, it seems as if many regressions are performed with White standard errors, but I am unsure what this entails and how to do that in Stata). Would I then run the same xtreg as I did before but also adding robust standard errors?

    Many thanks for your help. It has been a while since I took statistics at my university and I am unfortunately not entirely up to speed on my statistical knowledge.

    /N

  • #2
    Nils:
    1) the result of the F-test appearing as a footnote of the outcome table tells you that your dataset shows evidence of panelwise effect; hence a pooled OLS would not be appropriate given your data.
    2) As you detected both heteroskedasticity and autocorrelation, you're right in invoking cluster robust standard errors. However, your next step should be to test whether -re- specification fits your data via the community-contributed command -xtoverid- (just type -search xtoverid- from within Stata to spot and install it), as -hausman- deos not support non default standard errors.
    If the outcome of -xtoverid- reaches tstistical significance, you should go -fe-; otherwise, stick with -re-.

    To wrap up, you should do something along the following lines, one you have installed -xtoverid-:
    Code:
    xtreg YIELDDIFF BIDASKSP, re robust
    xtoverid
    That said, It seems strange that your model specification is ok with one predictor only. This is something that you can test, as you can see in the following toy-example:
    Code:
    . use "http://www.stata-press.com/data/r15/union.dta"
    (NLS Women 14-24 in 1968)
    
    . xtset idcode year
           panel variable:  idcode (unbalanced)
            time variable:  year, 70 to 88, but with gaps
                    delta:  1 unit
    
    . xtreg age i.grade, robust
    
    Random-effects GLS regression                   Number of obs     =     26,200
    Group variable: idcode                          Number of groups  =      4,434
    
    R-sq:                                           Obs per group:
         within  = 0.1045                                         min =          1
         between = 0.0188                                         avg =        5.9
         overall = 0.0298                                         max =         12
    
                                                    Wald chi2(17)     =          .
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .
    
                                 (Std. Err. adjusted for 4,434 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
             age |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           grade |
              1  |  -6.513248   1.867801    -3.49   0.000    -10.17407   -2.852425
              2  |   1.156889   3.683763     0.31   0.753    -6.063154    8.376932
              3  |   5.386752   1.867801     2.88   0.004     1.725929    9.047575
              4  |  -1.744937   2.538105    -0.69   0.492    -6.719532    3.229658
              5  |  -1.670922   2.542305    -0.66   0.511    -6.653749    3.311905
              6  |  -1.813056   2.122946    -0.85   0.393    -5.973953    2.347841
              7  |  -2.909495   2.064685    -1.41   0.159    -6.956202    1.137213
              8  |  -4.115661   1.958789    -2.10   0.036    -7.954816   -.2765063
              9  |  -3.057316   1.945918    -1.57   0.116    -6.871245     .756613
             10  |   -3.55541   1.907172    -1.86   0.062    -7.293397    .1825779
             11  |  -3.813377   1.899413    -2.01   0.045    -7.536158   -.0905966
             12  |  -3.357136   1.871466    -1.79   0.073    -7.025141    .3108696
             13  |  -.6027537   1.893576    -0.32   0.750    -4.314094    3.108587
             14  |   .5878884   1.894662     0.31   0.756    -3.125581    4.301358
             15  |  -.5306016   1.917914    -0.28   0.782    -4.289644     3.22844
             16  |  -1.235125    1.88085    -0.66   0.511    -4.921523    2.451272
             17  |   2.430688   1.891372     1.29   0.199    -1.276333    6.137708
             18  |   5.683285   1.884807     3.02   0.003     1.989132    9.377438
                 |
           _cons |   32.11325   1.867801    17.19   0.000     28.45243    35.77407
    -------------+----------------------------------------------------------------
         sigma_u |  3.8130736
         sigma_e |  5.2673733
             rho |  .34384807   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . predict u, xb
    
    . g sq_u=u^2
    
    . xtreg age u sq_u, robust
    
    Random-effects GLS regression                   Number of obs     =     26,200
    Group variable: idcode                          Number of groups  =      4,434
    
    R-sq:                                           Obs per group:
         within  = 0.1042                                         min =          1
         between = 0.0189                                         avg =        5.9
         overall = 0.0299                                         max =         12
    
                                                    Wald chi2(2)      =    1305.46
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                 (Std. Err. adjusted for 4,434 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
             age |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               u |   .8801686   .5719794     1.54   0.124    -.2408904    2.001228
            sq_u |    .001519   .0087168     0.17   0.862    -.0155656    .0186036
           _cons |   2.231944   9.296946     0.24   0.810    -15.98974    20.45362
    -------------+----------------------------------------------------------------
         sigma_u |  3.7731301
         sigma_e |  5.3901961
             rho |  .32885822   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . test sq_u
    
     ( 1)  sq_u = 0
    
               chi2(  1) =    0.03
             Prob > chi2 =    0.8617
    *as the -test- outcome does not reach statistical significance, there's no evidence of model misspecification*
    .
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Carlo Lazzaro Thank you for your reply.

      I have run -xtoverid- giving the following result:
      Code:
      . xtreg YIELDDIFF BIDASKSP, re robust
      
      Random-effects GLS regression                   Number of obs     =     44,751
      Group variable: RIC_2                           Number of groups  =        166
      
      R-sq:                                           Obs per group:
           within  = 0.0485                                         min =         13
           between = 0.0059                                         avg =      269.6
           overall = 0.0504                                         max =      1,178
      
                                                      Wald chi2(1)      =      11.27
      corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0008
      
                                      (Std. Err. adjusted for 166 clusters in RIC_2)
      ------------------------------------------------------------------------------
                   |               Robust
         YIELDDIFF |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          BIDASKSP |  -.2832264   .0843749    -3.36   0.001    -.4485982   -.1178546
             _cons |    .001954   .0123481     0.16   0.874    -.0222477    .0261557
      -------------+----------------------------------------------------------------
           sigma_u |  .15764626
           sigma_e |  .06431106
               rho |  .85732453   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      . 
      . xtoverid
      
      Test of overidentifying restrictions: fixed vs random effects
      Cross-section time-series model: xtreg re  robust cluster(RIC_2)
      Sargan-Hansen statistic   1.888  Chi-sq(1)    P-value = 0.1694
      As we get a P-value of 0.1694, I interpret this as meaning that I should use -fe- rather than -re- since the test does not reach statistical significance. To clarify, the reason that we perform this test rather than the Hausman test is that since we have both heteroscedasticity and autocorrelation in our data, meaning that we should use robust standard errors, we need to run another type of test as the Hausman test does not work for non default standard errors?

      2. On your last point, I think I should clarify what I am trying to achieve. The main purpose of my bachelor thesis is to isolate and show the value of Pi, i.e. the unobserved effect in the first regression, as well as identifying what variables affect Pi itself. As such, I will be doing two regressions. The first one is the one specified in the original post, where I am trying to isolate the effect of Pi on the YIELDDIFF. I have used a matching method to retrieve matching pairs of bonds which should eliminate most variables that explain any difference in yield, which is why I only have one predictor (BIDASKSP) and an unobserved fixed effect estimator (Pi). So to clarify, in the first regression I am simply trying to "isolate" the unobserved effect Pi.

      In the second step, I will run a regression on Pi, using more predictors. These predictors will be some of the characteristics of the bond, i.e. rating, issue date, issue amount, etc. At this stage I am trying to identify what variables explain and affect the unobserved effect in the first regression, Pi. The second regression is what will give me my main result, and so the first regression simply aims to isolate Pi so that I can run a regression with Pi as the dependent variable.

      Does this clarify your point about model misspecification? I ran the tests you suggested although I'm not quite sure how to interpret the results. The results were as follows:
      Code:
       xtreg YIELDDIFF BIDASKSP, fe robust
      
      Fixed-effects (within) regression               Number of obs     =     44,751
      Group variable: RIC_2                           Number of groups  =        166
      
      R-sq:                                           Obs per group:
           within  = 0.0485                                         min =         13
           between = 0.0059                                         avg =      269.6
           overall = 0.0504                                         max =      1,178
      
                                                      F(1,165)          =      11.22
      corr(u_i, Xb)  = 0.0180                         Prob > F          =     0.0010
      
                                      (Std. Err. adjusted for 166 clusters in RIC_2)
      ------------------------------------------------------------------------------
                   |               Robust
         YIELDDIFF |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
          BIDASKSP |  -.2839549   .0847902    -3.35   0.001    -.4513685   -.1165413
             _cons |   .0158317   .0004935    32.08   0.000     .0148573    .0168062
      -------------+----------------------------------------------------------------
           sigma_u |  .15915535
           sigma_e |  .06431106
               rho |   .8596394   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      . predict u, xb
      (248,862 missing values generated)
      
      . g sq_u=u^2
      (248,862 missing values generated)
      
      . xtreg YIELDDIFF u sq_u, fe robust
      
      Fixed-effects (within) regression               Number of obs     =     44,751
      Group variable: RIC_2                           Number of groups  =        166
      
      R-sq:                                           Obs per group:
           within  = 0.0540                                         min =         13
           between = 0.0010                                         avg =      269.6
           overall = 0.0295                                         max =      1,178
      
                                                      F(2,165)          =      16.92
      corr(u_i, Xb)  = -0.0394                        Prob > F          =     0.0000
      
                                      (Std. Err. adjusted for 166 clusters in RIC_2)
      ------------------------------------------------------------------------------
                   |               Robust
         YIELDDIFF |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
                 u |   1.081878   .2274152     4.76   0.000     .6328589    1.530897
              sq_u |  -2.409346   1.230557    -1.96   0.052    -4.839013    .0203221
             _cons |   .0016674   .0046555     0.36   0.721    -.0075246    .0108594
      -------------+----------------------------------------------------------------
           sigma_u |  .16169048
           sigma_e |  .06412671
               rho |  .86408553   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      . test sq_u
      
       ( 1)  sq_u = 0
      
             F(  1,   165) =    3.83
                  Prob > F =    0.0519
      Does the fact that this test "almost" reaches statistical significance (with an alpha of 5%) mean that I have modelmisspecification? Since this is only the step 1 regression and the step 2 regression with Pi as the dependent variable will have more predictors, is this really an issue?

      3. My final question would be how to actually retrieve descriptive statistics for the dependent variable in regression 1, Pi? To my understanding the regression only shows how a change in one variable affects the dependent variable. How do I get the implied values of Pi in my data following the first regression? As mentioned, it is my understanding that I need these to run the second regression.

      I hope I have been able to make myself clear. I appreciate your help and any input from any one else who cares to chime in.

      /N

      Comment


      • #4
        Nils:
        1) as -xtoverid- does not reach stsistical significance, you should go -re-, not -fe-.
        2) there's no evidence of misspecification in your step 1 model.
        3) If what you want to get from you first step model is the predicted value of -PI-, you should type after regression:
        Code:
        predict <choosethenameyoulike>, xb
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Carlo Lazzaro
          1) This is surprising to me - the paper that I am drawing inspiration from for my bachelor thesis uses -fe- instead of -re- (and has similar data). The data I have does not hold for a broader category of bonds and I'm looking to show the bond-specific time invariant unobserved effect in my regression, i.e. Pi. From what I have been able to understand so far about fixed effect vs. random effect - wouldn't this point to using a fixed effect model? If so, why does -xtoverid- still show that I should go -re-?

          FYI, these are the results I get from both an -fe- and an -re- xtreg in my step 1 regression:
          -fe-:
          Code:
          xtreg YIELDDIFF BIDASKSP, fe robust
          
          Fixed-effects (within) regression               Number of obs     =     44,751
          Group variable: RIC_2                           Number of groups  =        166
          
          R-sq:                                           Obs per group:
               within  = 0.0485                                         min =         13
               between = 0.0059                                         avg =      269.6
               overall = 0.0504                                         max =      1,178
          
                                                          F(1,165)          =      11.22
          corr(u_i, Xb)  = 0.0180                         Prob > F          =     0.0010
          
                                          (Std. Err. adjusted for 166 clusters in RIC_2)
          ------------------------------------------------------------------------------
                       |               Robust
             YIELDDIFF |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
              BIDASKSP |  -.2839549   .0847902    -3.35   0.001    -.4513685   -.1165413
                 _cons |   .0158317   .0004935    32.08   0.000     .0148573    .0168062
          -------------+----------------------------------------------------------------
               sigma_u |  .15915535
               sigma_e |  .06431106
                   rho |   .8596394   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          -re-:
          Code:
          xtreg YIELDDIFF BIDASKSP, re robust
          
          Random-effects GLS regression                   Number of obs     =     44,751
          Group variable: RIC_2                           Number of groups  =        166
          
          R-sq:                                           Obs per group:
               within  = 0.0485                                         min =         13
               between = 0.0059                                         avg =      269.6
               overall = 0.0504                                         max =      1,178
          
                                                          Wald chi2(1)      =      11.27
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0008
          
                                          (Std. Err. adjusted for 166 clusters in RIC_2)
          ------------------------------------------------------------------------------
                       |               Robust
             YIELDDIFF |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
              BIDASKSP |  -.2832264   .0843749    -3.36   0.001    -.4485982   -.1178546
                 _cons |    .001954   .0123481     0.16   0.874    -.0222477    .0261557
          -------------+----------------------------------------------------------------
               sigma_u |  .15764626
               sigma_e |  .06431106
                   rho |  .85732453   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          The high p-value of the intercept (_cons) strikes me as surprising. How should that be interpreted?

          2. I have run predict to generate a variable for the predicted value of Pi. Using summarize I get the following values:
          Code:
          predict GREENPREMIUM, xb
          (248,862 missing values generated)
          
          . summarize GREENPREMIUM
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
          GREENPREMIUM |     54,397    .0134974    .0434491  -.2865282   .2611284
          To perform the step 2 regression, as outlined above, would I then run a regression with GREENPREMIUM being the dependent variable and the other bond characteristics being my predictors? This regression will have 4 predictors + a dummy variable for industry of the issuer firm.
          Last edited by Nils Edgren; 28 Apr 2019, 04:19.

          Comment


          • #6
            Nils:
            1) it may well be that the Authors of the paper you mention went -fe- disregarding any test aimed at identifying the bets specification for their data.
            constant in -xtreg- is not that relevant; the interpretation is that its value can be <0, but that's all. What should be of some concern is the very low R-sq between, that stems from having one predictor only in your Step 1 regression.
            2) Yes, you should go that way.
            As an aside, I would recommend you to take all your doubts up with you teacher/supervisor.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Carlo Lazzaro
              Thank you for your assistance. I have tried running the step 2 regression but am having trouble creating categorical variables for some of the bond characteristics. I've browsed the forum and saw that you replied to similar posts in the past but I couldn't quite make sense of the syntax required to achieve my desired outcome.

              A few of the variables in the Step 2 regressions are strings, namely Currency and Rating. I have generated new numerical variables using -encode name, generate (newname)-. Using -tabulate- to show the categories in the data for these variables I get the following result:
              Code:
              . tabulate RATING_n
              
                CORRATING |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        A |     78,561       25.90       25.90
                       A  |      1,827        0.60       26.51
                       AA |     63,944       21.08       47.59
                      AAA |     71,250       23.49       71.08
                      BBB |     36,540       12.05       83.13
                      N/A |     51,156       16.87      100.00
              ------------+-----------------------------------
                    Total |    303,278      100.00
              
              . tabulate CURRENCY_n
              
                 CURRENCY |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                      AUD |     12,789        4.22        4.22
                      CAD |      3,654        1.20        5.42
                      CHF |      5,481        1.81        7.23
                      EUR |     89,523       29.52       36.75
                      HKD |      9,135        3.01       39.76
                      IDR |      1,827        0.60       40.36
                      INR |      3,654        1.20       41.57
                      NOK |      3,654        1.20       42.77
                      NZD |      3,654        1.20       43.98
                      SEK |     65,772       21.69       65.66
                      USD |    104,135       34.34      100.00
              ------------+-----------------------------------
                    Total |    303,278      100.00
              I am trying to make these variables into categorical variables with subgroups for each rating and each currency. The desired base levels (reference categories) are AAA for rating and USD for currency.

              1) How do I achieve this desired outcome so that I can regress GREENPREMIUM on these variables (and a few others)?

              Again, I appreciate your help. I have scheduled a meeting with my supervisor to discuss my method in the coming days but his experience with using Stata is unfortunately limited. Luckily I found this great forum!

              Comment


              • #8
                Nils:
                a possible approach mirrors the following toy-example:
                Code:
                sysuse auto.dta
                tab rep78, gen(new_rep78)
                egen overall_dummies=rowtotal( new_rep78*)
                replace overall_dummies=1 if rep78==1
                replace overall_dummies=2 if rep78==2
                replace overall_dummies=3 if rep78==3
                replace overall_dummies=4 if rep78==4
                replace overall_dummies=5 if rep78==5
                replace overall_dummies=. if overall_dummies==0
                tab overall_dummies
                Yiou shoild consider -label- and -fvvarlist- notation as well.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  Carlo Lazzaro
                  I read through the -help fvvarlist- to try to understand the proper way of achieving this but wasn't able to determine how you actually create the categories within each variable. As I understand it you don't actually create dummy variables but just specify subcategories within an already existing variable?

                  I tested the approach you gave with the following result:
                  Code:
                  egen overall_dummies=rowtotal(RATING_n)
                  
                  . 
                  . replace overall_dummies=1 if RATING_n==1
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies=2 if RATING_n==2
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies=3 if RATING_n==3
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies=4 if RATING_n==4
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies=5 if RATING_n==5
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies=6 if RATING_n==6
                  (0 real changes made)
                  . 
                  . replace overall_dummies=. if overall_dummies==0
                  (0 real changes made)
                  
                  . 
                  . tab overall_dummies
                  
                  overall_dum |
                         mies |      Freq.     Percent        Cum.
                  ------------+-----------------------------------
                            1 |     78,561       25.90       25.90
                            2 |      1,827        0.60       26.51
                            3 |     63,944       21.08       47.59
                            4 |     71,250       23.49       71.08
                            5 |     36,540       12.05       83.13
                            6 |     51,156       16.87      100.00
                  ------------+-----------------------------------
                        Total |    303,278      100.00
                  
                  . egen overall_dummies_CURRENCY=rowtotal(CURRENCY_n)
                  
                  . 
                  . replace overall_dummies_CURRENCY=1 if CURRENCY_n==1
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=2 if CURRENCY_n==2
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=3 if CURRENCY_n==3
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=4 if CURRENCY_n==4
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=5 if CURRENCY_n==5
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=6 if CURRENCY_n==6
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=7 if CURRENCY_n==7
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=8 if CURRENCY_n==8
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=9 if CURRENCY_n==9
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=10 if CURRENCY_n==10
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=11 if CURRENCY_n==11
                  (0 real changes made)
                  
                  . 
                  . replace overall_dummies_CURRENCY=. if overall_dummies_CURRENCY==0
                  (0 real changes made)
                  
                  . 
                  . tab overall_dummies_CURRENCY
                  
                  overall_dum |
                  mies_CURREN |
                           CY |      Freq.     Percent        Cum.
                  ------------+-----------------------------------
                            1 |     12,789        4.22        4.22
                            2 |      3,654        1.20        5.42
                            3 |      5,481        1.81        7.23
                            4 |     89,523       29.52       36.75
                            5 |      9,135        3.01       39.76
                            6 |      1,827        0.60       40.36
                            7 |      3,654        1.20       41.57
                            8 |      3,654        1.20       42.77
                            9 |      3,654        1.20       43.98
                           10 |     65,772       21.69       65.66
                           11 |    104,135       34.34      100.00
                  ------------+-----------------------------------
                        Total |    303,278      100.00
                  
                  . xtreg GREENPREMIUM overall_dummies overall_dummies_CURRENCY, fe robust
                  note: overall_dummies omitted because of collinearity
                  note: overall_dummies_CURRENCY omitted because of collinearity
                  
                  Fixed-effects (within) regression               Number of obs     =     54,402
                  Group variable: RIC_2                           Number of groups  =        166
                  
                  R-sq:                                           Obs per group:
                       within  = 0.0000                                         min =         18
                       between = 0.0350                                         avg =      327.7
                       overall =      .                                         max =      1,180
                  
                                                                  F(0,165)          =          .
                  corr(u_i, Xb)  =      .                         Prob > F          =          .
                  
                                                              (Std. Err. adjusted for 166 clusters in RIC_2)
                  ------------------------------------------------------------------------------------------
                                           |               Robust
                              GREENPREMIUM |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------------------+----------------------------------------------------------------
                           overall_dummies |          0  (omitted)
                  overall_dummies_CURRENCY |          0  (omitted)
                                     _cons |   .0134972          .        .       .            .           .
                  -------------------------+----------------------------------------------------------------
                                   sigma_u |  .03493133
                                   sigma_e |  .01669494
                                       rho |  .81405202   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------------------
                  I have two questions following this:
                  1) Did this succesfully create subcategories within the two variables? Since the variables were removed in the regression I am unsure if they are treated as "separate" subcategories that would be given their own beta values.

                  2) Why are both of these variables removed due to collinearity? I understand that it is due to dependencies between the predictors but I don't quite understand why that happens?

                  Comment


                  • #10
                    Nils:
                    can you please post an excerpt of your data via -dataex-? Thanks.
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Carlo Lazzaro

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input str13 RIC float GREENPREMIUM str3(CORRATING CURRENCY) double(ISSUEAMOUNT MATURITY)
                      "29874QCW2="          . "AAA" "USD" 6.500e+08  .5561643835616439
                      "29874QCW2="          . "AAA" "USD" 6.500e+08  .5561643835616439
                      "29874QDG6="          . "AAA" "USD" 5.000e+08 2.5397260273972604
                      "302154BZ1=" .015852291 "AA"  "USD" 4.000e+08  2.117808219178082
                      "30216BGU0=" .015900685 "AAA" "USD" 5.000e+08 1.4191780821917808
                      "50064YAN3="          . "AA"  "USD" 6.000e+08  4.567123287671233
                      "62630CAH4=" .017109547 "A"   "USD" 5.000e+08 2.7260273972602738
                      "690353C21="   .1342553 "N/A" "USD"  47300000 10.715068493150685
                      "690353E52="          . "N/A" "USD"  37400000 10.715068493150685
                      "89114QBT4="          . "AA"  "USD" 1.000e+09 1.6986301369863013
                      end
                      The data includes more currencies than USD, these were simply the ones chosen when selecting a random sample. I also made numeric variables of both currency and rating that can be used in regressions, but couldn't make those work in dataex so I used the string variables here instead. Again, I'm trying to make rating (CORRATING) and currency (CURRENCY) into categorical variables with AAA being the base level for rating and USD the base level for currency.

                      The variables that are being removed due to collinearity are constant for each bond for the entire time-series, i.e. rating, currency, issueamount, maturity, etc. are the same for all observations of each respective bond. From reading on this forum it seems that this causes these variables to be excluded from the regression? The paper I mentioned previously used these variables as predictors for the dependent variable GREENPREMIUM, as I am trying to do. The variables should be significant in the regression, based on previous research, and so it seems strange to me that I cannot include them in the regression.
                      Last edited by Nils Edgren; 29 Apr 2019, 02:33.

                      Comment


                      • #12
                        Nils:
                        this is something you coulld have easily done yourself with -encode-:
                        Code:
                        . encode CORRATING, g(CORRATING_NUM)
                        
                        . encode CURRENCY , g( CURRENCY_NUM)
                        Get yourself familiar with -fvvarlist- (and related options) to set the reference categories you're interested in.
                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment


                        • #13
                          Carlo Lazzaro

                          I probably should've made myself clearer - I have already used encode to make numerical variables, I only didn't find a way to include them with labels in -dataex- so used the "original" variables instead.

                          I have read over -help fvvarlist- and have been able to understand it better now that I have some experience (albeit very limited) with Stata. I have managed to create the factor variables I was after.

                          However, I am running into the problem with collinearity. I am afraid I will have to expose my statistical illiteracy here, again it's been a while since I've taken courses in Statistics so my knowledge needs some brushing up.

                          I am running xtreg with fixed effects as follows:
                          Code:
                          xtreg GREENPREMIUM logISSUEAMOUNT MATURITY i.CURRENCY_n i.RATING_n i.PROJECTTYPE_n, fe robust
                          note: logISSUEAMOUNT omitted because of collinearity
                          note: MATURITY omitted because of collinearity
                          note: 1.CURRENCY_n omitted because of collinearity
                          note: 2.CURRENCY_n omitted because of collinearity
                          note: 3.CURRENCY_n omitted because of collinearity
                          note: 4.CURRENCY_n omitted because of collinearity
                          note: 5.CURRENCY_n omitted because of collinearity
                          note: 6.CURRENCY_n omitted because of collinearity
                          note: 7.CURRENCY_n omitted because of collinearity
                          note: 8.CURRENCY_n omitted because of collinearity
                          note: 9.CURRENCY_n omitted because of collinearity
                          note: 10.CURRENCY_n omitted because of collinearity
                          note: 1.RATING_n omitted because of collinearity
                          note: 3.RATING_n omitted because of collinearity
                          note: 5.RATING_n omitted because of collinearity
                          note: 6.RATING_n omitted because of collinearity
                          note: 2.PROJECTTYPE_n omitted because of collinearity
                          note: 3.PROJECTTYPE_n omitted because of collinearity
                          note: 5.PROJECTTYPE_n omitted because of collinearity
                          note: 6.PROJECTTYPE_n omitted because of collinearity
                          note: 7.PROJECTTYPE_n omitted because of collinearity
                          note: 8.PROJECTTYPE_n omitted because of collinearity
                          
                          Fixed-effects (within) regression               Number of obs     =     54,402
                          Group variable: RIC_2                           Number of groups  =        166
                          
                          R-sq:                                           Obs per group:
                               within  = 0.0000                                         min =         18
                               between = 0.0350                                         avg =      327.7
                               overall =      .                                         max =      1,180
                          
                                                                          F(0,165)          =          .
                          corr(u_i, Xb)  =      .                         Prob > F          =          .
                          
                                                               (Std. Err. adjusted for 166 clusters in RIC_2)
                          -----------------------------------------------------------------------------------
                                            |               Robust
                               GREENPREMIUM |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          ------------------+----------------------------------------------------------------
                             logISSUEAMOUNT |          0  (omitted)
                                   MATURITY |          0  (omitted)
                                            |
                                 CURRENCY_n |
                                       AUD  |          0  (omitted)
                                       CAD  |          0  (omitted)
                                       CHF  |          0  (omitted)
                                       EUR  |          0  (omitted)
                                       HKD  |          0  (omitted)
                                       IDR  |          0  (omitted)
                                       INR  |          0  (omitted)
                                       NOK  |          0  (omitted)
                                       NZD  |          0  (omitted)
                                       SEK  |          0  (omitted)
                                            |
                                   RATING_n |
                                         A  |          0  (omitted)
                                        AA  |          0  (omitted)
                                       BBB  |          0  (omitted)
                                       N/A  |          0  (omitted)
                                            |
                              PROJECTTYPE_n |
                                    Energy  |          0  (omitted)
                                  Land Use  |          0  (omitted)
                                       N/A  |          0  (omitted)
                            Transportation  |          0  (omitted)
                          Waste Management  |          0  (omitted)
                                     Water  |          0  (omitted)
                                            |
                                      _cons |   .0134972          .        .       .            .           .
                          ------------------+----------------------------------------------------------------
                                    sigma_u |  .03493133
                                    sigma_e |  .01669494
                                        rho |  .81405202   (fraction of variance due to u_i)
                          -----------------------------------------------------------------------------------
                          As you can see, all of my variables are being omitted from the regression. After reading the forum and browsing the Stata manual I suspect that this is some form of the dummy variable trap? These variables are constant for each bond and do not change over the time period. Since I am specifying the model as a fixed effect, is this problem arising because there are no changes in neither the dependent nor any of the predictor variables? This intuitively makes sense as the fixed effect specification, if I understand correctly, means that what I am doing is basically running a regression on variables that are all constant, which would explain why there is collinearity? Am I thinking correctly and if so, how do I get around this?

                          ​​​​​​​Many thanks for your help Carlo, I really appreciate it.

                          Comment


                          • #14
                            Nils:
                            your intuition is correct: the -fe-machinery wipes out time-invariant predictors (and, unfortunately, currencies, like family names, are not expected to change as time goes by).
                            The last resort is to check whether -re-specification fits your data better than -fe-.
                            Kind regards,
                            Carlo
                            (Stata 18.0 SE)

                            Comment


                            • #15
                              Carlo Lazzaro
                              That does make sense, and is something I probably should've realized earlier. In the step 1 regression mentioned earlier I ran a fixed effect regression to obtain the variable GREENPREMIUM. This variable is constant for each bond over the time-series but varies between bonds. Would it be reasonable to use the -re- specification in the step 2 regression even if I used the -fe- specification to obtain the dependent variable in step 2 (i.e. GREENPREMIUM)? Or would a pooled OLS be a better approach? To illustrate, I tried performing both. Doing so yields the following result:

                              Code:
                               xtreg GREENPREMIUM logISSUEAMOUNT MATURITY i.CURRENCY_n i.RATING_n i.PROJECTTYPE_n, re robust
                              
                              Random-effects GLS regression                   Number of obs     =     54,402
                              Group variable: RIC_2                           Number of groups  =        166
                              
                              R-sq:                                           Obs per group:
                                   within  = 0.0000                                         min =         18
                                   between = 0.1362                                         avg =      327.7
                                   overall = 0.1798                                         max =      1,180
                              
                                                                              Wald chi2(21)     =          .
                              corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =          .
                              
                                                                   (Std. Err. adjusted for 166 clusters in RIC_2)
                              -----------------------------------------------------------------------------------
                                                |               Robust
                                   GREENPREMIUM |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                              ------------------+----------------------------------------------------------------
                                 logISSUEAMOUNT |  -.0034405   .0035123    -0.98   0.327    -.0103245    .0034435
                                       MATURITY |   .0031035   .0013315     2.33   0.020     .0004938    .0057131
                                                |
                                     CURRENCY_n |
                                           AUD  |   .0010443   .0071359     0.15   0.884    -.0129418    .0150303
                                           CAD  |   .0019359   .0057097     0.34   0.735    -.0092549    .0131266
                                           CHF  |  -.0314472    .014296    -2.20   0.028    -.0594669   -.0034274
                                           EUR  |   -.011928   .0073211    -1.63   0.103     -.026277     .002421
                                           HKD  |  -.0405451   .0277118    -1.46   0.143    -.0948592     .013769
                                           IDR  |   .0288521   .0197445     1.46   0.144    -.0098464    .0675506
                                           INR  |   .0031791   .0115173     0.28   0.783    -.0193944    .0257526
                                           NOK  |  -.0261247   .0133215    -1.96   0.050    -.0522344    -.000015
                                           NZD  |  -.0319061   .0128562    -2.48   0.013    -.0571039   -.0067083
                                           SEK  |  -.0020055   .0082697    -0.24   0.808    -.0182138    .0142028
                                                |
                                       RATING_n |
                                             A  |   .0014689   .0067198     0.22   0.827    -.0117016    .0146395
                                            AA  |  -.0008373   .0058051    -0.14   0.885     -.012215    .0105404
                                           BBB  |   .0024144   .0048281     0.50   0.617    -.0070485    .0118772
                                           N/A  |   .0061399   .0102482     0.60   0.549    -.0139461    .0262259
                                                |
                                  PROJECTTYPE_n |
                                        Energy  |   .0012968   .0087217     0.15   0.882    -.0157975    .0183911
                                      Land Use  |  -.0037861   .0132794    -0.29   0.776    -.0298132     .022241
                                           N/A  |   -.009492   .0116256    -0.82   0.414    -.0322778    .0132938
                                Transportation  |   .0035156   .0082554     0.43   0.670    -.0126646    .0196958
                              Waste Management  |   .0213694   .0275832     0.77   0.439    -.0326927    .0754315
                                         Water  |   .0393904     .01969     2.00   0.045     .0007987     .077982
                                                |
                                          _cons |   .0755063   .0697502     1.08   0.279    -.0612015    .2122142
                              ------------------+----------------------------------------------------------------
                                        sigma_u |  .03484352
                                        sigma_e |  .01669494
                                            rho |  .81328881   (fraction of variance due to u_i)
                              -----------------------------------------------------------------------------------
                              Pooled OLS:
                              Code:
                              regress GREENPREMIUM logISSUEAMOUNT MATURITY i.CURRENCY_n i.RATING_n i.PROJECTTYPE_n, vce(cluster RIC_2)
                              
                              Linear regression                               Number of obs     =     54,402
                                                                              F(20, 165)        =          .
                                                                              Prob > F          =          .
                                                                              R-squared         =     0.2052
                                                                              Root MSE          =     .03874
                              
                                                                   (Std. Err. adjusted for 166 clusters in RIC_2)
                              -----------------------------------------------------------------------------------
                                                |               Robust
                                   GREENPREMIUM |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                              ------------------+----------------------------------------------------------------
                                 logISSUEAMOUNT |  -.0038494   .0055061    -0.70   0.485    -.0147209    .0070222
                                       MATURITY |   .0047137   .0020394     2.31   0.022     .0006869    .0087405
                                                |
                                     CURRENCY_n |
                                           AUD  |  -.0003502   .0074541    -0.05   0.963    -.0150678    .0143675
                                           CAD  |   .0031494   .0084453     0.37   0.710    -.0135254    .0198242
                                           CHF  |  -.0332557   .0225441    -1.48   0.142    -.0777678    .0112564
                                           EUR  |  -.0202907   .0113747    -1.78   0.076    -.0427494     .002168
                                           HKD  |  -.0702176   .0319255    -2.20   0.029    -.1332528   -.0071824
                                           IDR  |    .039081   .0312817     1.25   0.213     -.022683     .100845
                                           INR  |   .0084365   .0187356     0.45   0.653    -.0285559    .0454289
                                           NOK  |   -.050783   .0208695    -2.43   0.016    -.0919887   -.0095772
                                           NZD  |  -.0248428   .0168512    -1.47   0.142    -.0581146    .0084291
                                           SEK  |   .0070028   .0106498     0.66   0.512    -.0140246    .0280303
                                                |
                                       RATING_n |
                                             A  |   .0060837   .0076685     0.79   0.429    -.0090573    .0212246
                                            AA  |  -.0010465   .0090199    -0.12   0.908    -.0188557    .0167628
                                           BBB  |   .0103699   .0088591     1.17   0.243    -.0071219    .0278616
                                           N/A  |   .0175343   .0124369     1.41   0.160    -.0070217    .0420903
                                                |
                                  PROJECTTYPE_n |
                                        Energy  |   .0162782   .0147251     1.11   0.271    -.0127957    .0453521
                                      Land Use  |   .0139873   .0244345     0.57   0.568    -.0342573     .062232
                                           N/A  |  -.0021629   .0177791    -0.12   0.903    -.0372667    .0329409
                                Transportation  |   .0109533   .0106392     1.03   0.305    -.0100533    .0319599
                              Waste Management  |   .0513677   .0341644     1.50   0.135     -.016088    .1188234
                                         Water  |   .0760223   .0273014     2.78   0.006     .0221171    .1299275
                                                |
                                          _cons |   .0645238   .1117128     0.58   0.564    -.1560471    .2850947
                              -----------------------------------------------------------------------------------
                              After reading your replies to earlier posts on the forum, I also ran -xttest0- with the following result:
                              Code:
                              xttest0
                              
                              Breusch and Pagan Lagrangian multiplier test for random effects
                              
                                      GREENPREMIUM[RIC_2,t] = Xb + u[RIC_2] + e[RIC_2,t]
                              
                                      Estimated results:
                                                       |       Var     sd = sqrt(Var)
                                              ---------+-----------------------------
                                             GREENPR~M |   .0018877       .0434479
                                                     e |   .0002787       .0166949
                                                     u |   .0012141       .0348435
                              
                                      Test:   Var(u) = 0
                                                           chibar2(01) =  8.9e+06
                                                        Prob > chibar2 =   0.0000
                              If I understood those replies correctly, this would point to -xtreg, re- being the better choice due to the statistical significance?

                              Comment

                              Working...
                              X