Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Absurdly low R^2 (using xtreg)

    Hi all,

    I've encountered an issue with absurdly low R^2 values using xtreg. The output reads as follows:

    Code:
    . xtreg gdp_growth ln_street_hat gdp_de_growth lag_ln_gdp_pc lag_ln_rate_street , fe vce(r)
    
    Fixed-effects (within) regression               Number of obs     =      4,812
    Group variable: region_id                       Number of groups  =        401
    
    R-sq:                                           Obs per group:
         within  = 0.3094                                         min =         12
         between = 0.0011                                         avg =       12.0
         overall = 0.0613                                         max =         12
    
                                                    F(4,400)          =     269.08
    corr(u_i, Xb)  = -0.8802                        Prob > F          =     0.0000
    
                                      (Std. Err. adjusted for 401 clusters in region_id)
    ------------------------------------------------------------------------------------
                       |               Robust
            gdp_growth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------------+----------------------------------------------------------------
         ln_street_hat |  -.3368946   .1511572    -2.23   0.026    -.6340565   -.0397327
         gdp_de_growth |   .8132571    .082375     9.87   0.000      .651315    .9751992
         lag_ln_gdp_pc |  -.2128467   .0568266    -3.75   0.000    -.3245629   -.1011305
    lag_ln_rate_street |   .1413698   .0806067     1.75   0.080    -.0170959    .2998355
                 _cons |    3.54519    1.07787     3.29   0.001     1.426192    5.664188
    -------------------+----------------------------------------------------------------
               sigma_u |  .04698124
               sigma_e |  .03586871
                   rho |  .63175799   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------
    Now I don't understand how the R^2 value can be that low. Especially the between-one seems weird, considering i'm using fixed effects, which should account for most of the between-variation, no? Also, when I use predict, the correlation between the predicted value and the actual value of the variable is quite high, around 0.6. With this, and since all my variables seem fairly significant, I don't understand what's going on here. I should note that this is quite robust to various specifications of the model, I added und subtracted variables, used RE insted of FE etc, and nothing changed this low R2. Do any of you have an idea what the problem could be here?

    Best regards

    Tim

    Last edited by Tim Umbach; 04 Jun 2018, 04:22.

  • #2
    Tim:
    I do not see any problem there.
    -fe- machinery focuses on the within variation, which is the R-sq indicator you should consider.
    Should any prroblem ever exist, it might be due to the fact that -re- outperforms -fe- specification for your data, but I assume that you've already investigated that issue via the user-written programma -xtoverid- (as you invoked non-default standard errors, -hausman- is not a choice there).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo,

      yes, I've tested whether I can use RE, and the answer is no. I understand that my explanatory power comes from within-variation, but I'm not sure whether I've specified my model correctly. But if not, i'm not sure what I could change or test, unless there is some magical variable I've ommitted. On a related note: What exactly does corr(u_i, Xb) = -0.8802 indicate? Isn't it normal, that the predicted values are correlated with the residuals?

      Regards,

      Tim

      Comment


      • #4
        Tim:
        - you can skim through the literarture in your reserach field and see how others specified their rregression models when presented with the same research goal;
        - corr(u_i, Xb) = -0.8802 indicates the correlation between the panel-wise effect and the vector of regressors. As you can easily find out in any decent textbook on panel data econometrics, this a form of weak endogeneity that -fe- specification allows (contrast with -xtreg,re- requirements).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Ah, right, that makes sense. Thank you, and best regards

          Tim

          Comment


          • #6
            Tim:
            in my previous post I should have written limited endogeneity instead of weak endogeneity (see page 237 of the reference quoted below).
            For more details on this and other panel-related topics, I would refer you to: https://www.stata.com/bookstore/micr...metrics-stata/
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hello All,
              I have been using xtreg, fe (robust) and find very low R square (even within R2 is 0.01). To the best of my knowledge, I have incorporated most of the variables that have been used in my area of research. Still the R square is absurdly so low. This is also the best-fit model that I could come up with after doing multiple permutations-combinations. I do not know should I even proceed with this model. If yes, then how do I justify such low R square. Here are my results, the Panel Id is Firm: Unique_Identifier, and I impose Year dummy as well.

              Code:
              . . xtreg  Growth_Real_SalesbyTA BLEV Square_DebtbyTA Log_Total_Assets Square_Log_TotalAssets PROF ADM InvestmentbyTA Log_
              > Age rdexpensesbyta i.Year, fe robust
              
              Fixed-effects (within) regression               Number of obs      =     20339
              Group variable: Unique_Ide~r                    Number of groups   =      2581
              
              R-sq:  within  = 0.0162                         Obs per group: min =         1
                     between = 0.0004                                        avg =       7.9
                     overall = 0.0025                                        max =        17
              
                                                              F(25,2580)         =     17.55
              corr(u_i, Xb)  = -0.2272                        Prob > F           =    0.0000
              
                                           (Std. Err. adjusted for 2581 clusters in Unique_Identifier)
              ----------------------------------------------------------------------------------------
                                     |               Robust
               Growth_Real_SalesbyTA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              -----------------------+----------------------------------------------------------------
                                BLEV |   .2585796    .130317     1.98   0.047     .0030432    .5141161
                     Square_DebtbyTA |  -.0590563   .1570279    -0.38   0.707    -.3669697    .2488572
                    Log_Total_Assets |  -.2504641   .0629661    -3.98   0.000    -.3739333   -.1269949
              Square_Log_TotalAssets |   .0090401   .0034819     2.60   0.009     .0022125    .0158677
                                PROF |   .6814883   .0851691     8.00   0.000     .5144816    .8484949
                                 ADM |   .9346583   .2609631     3.58   0.000     .4229399    1.446377
                      InvestmentbyTA |  -.0513566   .0212274    -2.42   0.016    -.0929811   -.0097321
                             Log_Age |   .0490875   .0420108     1.17   0.243    -.0332909    .1314659
                      rdexpensesbyta |   .7108202   .6144103     1.16   0.247     -.493967    1.915607
                                   
                                     |
                               _cons |   .8878394   .2276962     3.90   0.000     .4413537    1.334325
              -----------------------+----------------------------------------------------------------
                             sigma_u |  1.7106941
                             sigma_e |  .62142904
                                 rho |  .88342438   (fraction of variance due to u_i)
              ----------------------------------------------------------------------------------------

              I have attached only a part of the results so as to save space and be very specific.

              Please someone help me here. Such Low R2 values how should I justify them in a FE model. Is it appropriate? Any related reads somebody can suggest me, I will be really grateful. Please help!

              Comment


              • #8
                Mohina:
                - you might have limited within panel variation in time-varying predictors.
                That said;
                -is your model specification correct (ie, have you investugated potential non-linearity, missed predictors and/or interactions)?;
                - are you sure that (via the community-contributed programme -xtoverid-, as you have invoked non-default standard errors) -fe- fits your data better than -re-?
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hello Sir,

                  Many thanks for the much needed reply.
                  Regarding inclusion of other terms; I have also incorporated square as well as Interaction terms in my model but nothing changes for the value of R square.
                  Also, although I have checked for implementation of FE vs RE through Hausman, I had earlier not done xtoverid. But, now I have done for the same model as I have attached above. I cross checked it for other models as well (one with square and interactions). I am attaching the results of the same model that I had posted above.

                  Code:
                  . xi: xtreg  Growth_Real_SalesbyTA BLEV Square_DebtbyTA Log_Total_Assets Square_Log_TotalAssets PROF ADM InvestmentbyTA 
                  > Log_Age rdexpensesbyta, re
                  
                  Random-effects GLS regression                   Number of obs      =     20339
                  Group variable: Unique_Ide~r                    Number of groups   =      2581
                  
                  R-sq:  within  = 0.0136                         Obs per group: min =         1
                         between = 0.0003                                        avg =       7.9
                         overall = 0.0020                                        max =        17
                  
                                                                  Wald chi2(9)       =    233.34
                  corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
                  
                  ----------------------------------------------------------------------------------------
                   Growth_Real_SalesbyTA |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -----------------------+----------------------------------------------------------------
                                    BLEV |   .1426503   .1173474     1.22   0.224    -.0873465     .372647
                         Square_DebtbyTA |   .0793734   .1523951     0.52   0.602    -.2193156    .3780624
                        Log_Total_Assets |  -.2210018   .0376791    -5.87   0.000    -.2948515   -.1471521
                  Square_Log_TotalAssets |   .0094487   .0022599     4.18   0.000     .0050193    .0138781
                                    PROF |   .6570864   .0699692     9.39   0.000     .5199493    .7942235
                                     ADM |   .8966035   .1714509     5.23   0.000      .560566    1.232641
                          InvestmentbyTA |   -.045998    .013351    -3.45   0.001    -.0721654   -.0198306
                                 Log_Age |   .1247992   .0305925     4.08   0.000     .0648391    .1847594
                          rdexpensesbyta |   .8146511   1.079389     0.75   0.450    -1.300913    2.930215
                                   _cons |   .5338655   .1543971     3.46   0.001     .2312528    .8364781
                  -----------------------+----------------------------------------------------------------
                                 sigma_u |  1.6639532
                                 sigma_e |  .62187267
                                     rho |  .87744262   (fraction of variance due to u_i)
                  ----------------------------------------------------------------------------------------
                  
                  . xtoverid
                  
                  Test of overidentifying restrictions: fixed vs random effects
                  Cross-section time-series model: xtreg re   
                  Sargan-Hansen statistic  42.878  Chi-sq(9)    P-value = 0.0000
                  So, it means I should go with the Fixed Effect model.
                  Have I performed it correctly. Is there any step that I missed out?
                  But, again as I go with FE then such low R square!! How to deal with it?

                  Comment


                  • #10
                    Mohina:
                    your did not include non-default standard error options in your las code, so -xtoverid- gives the very same indication as -hausman-. Please also note that, in your case the -xi:- prefix is redundant, as you do not use -fvvarlist- notation in your code.
                    Again, the limited within R-sq is probably due to a limited within panel variation of time-varying predictors.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Many Thanks Carlo for correcting the fault. I re-tried after including clustered robust Standard Errors. The result is shown as below:

                      Code:
                      . xtreg  Growth_Real_SalesbyTA BLEV Square_DebtbyTA Log_Total_Assets Square_Log_TotalAssets PROF ADM InvestmentbyTA Log_
                      > Age rdexpensesbyta, re vce(robust)
                      
                      Random-effects GLS regression                   Number of obs      =     20339
                      Group variable: Unique_Ide~r                    Number of groups   =      2581
                      
                      R-sq:  within  = 0.0136                         Obs per group: min =         1
                             between = 0.0003                                        avg =       7.9
                             overall = 0.0020                                        max =        17
                      
                                                                      Wald chi2(9)       =    273.74
                      corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
                      
                                                   (Std. Err. adjusted for 2581 clusters in Unique_Identifier)
                      ----------------------------------------------------------------------------------------
                                             |               Robust
                       Growth_Real_SalesbyTA |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -----------------------+----------------------------------------------------------------
                                        BLEV |   .1426503   .1291211     1.10   0.269    -.1104225     .395723
                             Square_DebtbyTA |   .0793734    .177186     0.45   0.654    -.2679048    .4266516
                            Log_Total_Assets |  -.2210018   .0539841    -4.09   0.000    -.3268088   -.1151949
                      Square_Log_TotalAssets |   .0094487   .0030794     3.07   0.002     .0034131    .0154843
                                        PROF |   .6570864   .0816103     8.05   0.000     .4971332    .8170396
                                         ADM |   .8966035   .2334635     3.84   0.000     .4390234    1.354184
                              InvestmentbyTA |   -.045998    .019354    -2.38   0.017    -.0839312   -.0080648
                                     Log_Age |   .1247992   .0269666     4.63   0.000     .0719457    .1776528
                              rdexpensesbyta |   .8146511   .5688084     1.43   0.152    -.3001928    1.929495
                                       _cons |   .5338655   .1886252     2.83   0.005     .1641669    .9035641
                      -----------------------+----------------------------------------------------------------
                                     sigma_u |  1.6639532
                                     sigma_e |  .62187267
                                         rho |  .87744262   (fraction of variance due to u_i)
                      ----------------------------------------------------------------------------------------
                      
                      . xtoverid
                      
                      Test of overidentifying restrictions: fixed vs random effects
                      Cross-section time-series model: xtreg re  robust cluster(Unique_Identifier)
                      Sargan-Hansen statistic 106.212  Chi-sq(9)    P-value = 0.0000
                      So, here also I would go for Fixed Effects Estimation Procedure. I am grateful to you for letting me know the underlying cause for such low R square. My fear is; would such model be even approved or accepted by Journals? I have incorporated all non-linear as well as interactions effect that I can possibly think of from the existing literature in my area. Additionally, I have checked for significance of time as well as firm fixed effects through F-test and find that significant too.

                      The surprising part is when I run the same model using OLS but impose time as well as Firm dummies in that system using i.Year and i.Unique-Identifier (as Unique-Identifier is my panel Id) then the R2 shoots upto 0.55. So, after performing this Least Square Dummy Variable (LSDV) approach, I am getting a better fit of the model but using Fixed Effects is giving such a bad fit. As FE already incorporates firm effects and I am putting time dummies, then it should also yield a better fit like the LSDV model (as both models are performing essentially the same thing, except that FE takes the within variation also for each panel id). Aren't both the models suppose to report qualitative similar results?


                      Can you please lead me to some reads that have low R square but the authors have defended it?

                      Comment


                      • #12
                        Mohina:
                        you cannot compare -fe- vs (pooled) OLS results as the latter is more similar to -xtreg,re-.
                        That said, there's something I'm not able to find in your code, despite your clear description:
                        - you include a squared term for debt, but the linear one is missing;
                        - you include a logged (why? And: why so many logged predictors?) term for -age-, but did not test whether or not -squared_age- could make sense in your model:
                        - you state that you tested the statistical significance of the (so called) time fixed effect, but I'm not able to detect it in your regression model (just add -i.year- and see what happens).
                        In addition, setting -fe- or -re- specification aside for a while, I fail to get what regression model you have in mind (and why): a linear-log (which is seldom used) one?
                        As a closing-out remark, you should leave behind creating interactions and categorical variables yourself and switch to -fvvarlist- notation:
                        Code:
                        . xtreg  Growth_Real_SalesbyTA BLEV DebtbyTA##DebtbyTA Log_Total_Assets##Log_TotalAssets PROF ADM InvestmentbyTA Age##Age i.year rdexpensesbyta, re vce(robust)
                        *Unsolicited advice: I've taken the liberty to plug in -i.year- and replace -Log_Age- with -Age- with both linear and squared terms*
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Hello Carlo,
                          Many thanks to you gain for keeping the discussion going. The variable BLEV was the linear Debt, and as you suggested I re-framed the model and attach below.

                          Code:
                          xtreg  Growth_Real_SalesbyTA c.BLEV##c.BLEV  c.Log_Total_Assets##c.Log_Total_Assets PROF ADM InvestmentbyTA c.Age##c.A
                          > ge rdexpensesbyta c.BLEV#c.InvestmentbyTA i.Year, fe robust
                          note: 2017.Year omitted because of collinearity
                          
                          Fixed-effects (within) regression               Number of obs      =     20339
                          Group variable: Unique_Ide~r                    Number of groups   =      2581
                          
                          R-sq:  within  = 0.0162                         Obs per group: min =         1
                                 between = 0.0001                                        avg =       7.9
                                 overall = 0.0010                                        max =        17
                          
                                                                          F(26,2580)         =     16.49
                          corr(u_i, Xb)  = -0.3054                        Prob > F           =    0.0000
                          
                                                                      (Std. Err. adjusted for 2581 clusters in Unique_Identifier)
                          -------------------------------------------------------------------------------------------------------
                                                                |               Robust
                                          Growth_Real_SalesbyTA |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          --------------------------------------+----------------------------------------------------------------
                                                           BLEV |    .256352   .1301793     1.97   0.049     .0010856    .5116185
                                                                |
                                                  c.BLEV#c.BLEV |  -.0563219   .1569274    -0.36   0.720    -.3640383    .2513945
                                                                |
                                               Log_Total_Assets |   -.253603   .0645069    -3.93   0.000    -.3800935   -.1271126
                                                                |
                          c.Log_Total_Assets#c.Log_Total_Assets |   .0092199   .0035762     2.58   0.010     .0022074    .0162325
                                                                |
                                                           PROF |   .6807332   .0851065     8.00   0.000     .5138493    .8476172
                                                            ADM |   .9309667   .2610907     3.57   0.000     .4189982    1.442935
                                                 InvestmentbyTA |  -.0684446   .0452489    -1.51   0.130    -.1571724    .0202833
                                                            Age |    .012371   .0030281     4.09   0.000     .0064332    .0183087
                                                                |
                                                    c.Age#c.Age |  -.0000301   .0000203    -1.48   0.139    -.0000699    9.77e-06
                                                                |
                                                 rdexpensesbyta |   .7114085   .6139257     1.16   0.247    -.4924285    1.915246
                                                                |
                                        c.BLEV#c.InvestmentbyTA |   .0508054   .0821618     0.62   0.536    -.1103043     .211915
                                                                |
                                                           Year |
                                                          2002  |  -.0119765   .0233427    -0.51   0.608    -.0577488    .0337958
                                                          2003  |   .0347805   .0208731     1.67   0.096    -.0061493    .0757102
                                                          2004  |   .0344911   .0198244     1.74   0.082    -.0043822    .0733643
                                                          2005  |   .0356632   .0261256     1.37   0.172    -.0155661    .0868924
                                                          2006  |  -.0317624   .0215083    -1.48   0.140    -.0739376    .0104129
                                                          2007  |   .0050354   .0237388     0.21   0.832    -.0415137    .0515845
                                                          2008  |   .0141739   .0246114     0.58   0.565    -.0340862     .062434
                                                          2009  |  -.0233156   .0142839    -1.63   0.103    -.0513246    .0046934
                                                          2010  |   .0045155   .0236838     0.19   0.849    -.0419257    .0509566
                                                          2011  |   .0481612   .0196804     2.45   0.014     .0095703    .0867522
                                                          2012  |  -.0001252   .0143361    -0.01   0.993    -.0282367    .0279863
                                                          2013  |   .0420947   .0433137     0.97   0.331    -.0428384    .1270279
                                                          2014  |  -.0003686   .0131846    -0.03   0.978    -.0262221    .0254848
                                                          2015  |   .0287295   .0139717     2.06   0.040     .0013326    .0561264
                                                          2016  |   .0527194   .0185105     2.85   0.004     .0164224    .0890164
                                                          2017  |          0  (omitted)
                                                                |
                                                          _cons |   .7793541   .2368661     3.29   0.001     .3148872    1.243821
                          --------------------------------------+----------------------------------------------------------------
                                                        sigma_u |  1.7185167
                                                        sigma_e |   .6214287
                                                            rho |  .88436091   (fraction of variance due to u_i)
                          -------------------------------------------------------------------------------------------------------
                          I have tried to take Total_Assets without log also but even then I fail to get higher R square in almost all of my specifications. How should I handle this?

                          Please help!!!

                          Comment


                          • #14
                            Mohina:
                            some comments about your last post:
                            - squared age and squared debt are redundant, as they do not reach statistical significance;
                            - if the value
                            Code:
                            . di (.253603   )/2*(.0092199)
                            .0011691
                            falls within the range of -Log_Total_Assets- you have a minimum for your dependent variable;
                            - you can also test whether -i.year- jointly reach or not statistical significance via -testparm(i.year)-;
                            - in sum, I do think that the main issue with your data rests on a (too) limited within panel variation for time-varying predictors. Unfortunately, there's nothing you can do about that.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Many Thanks Carlo for helping out in this process. I did check for testparm i.Year and it was significant as well. But, as you told unfortunately my data suffers from something that cannot be dealt with.

                              best regards,

                              Mohina

                              Comment

                              Working...
                              X