Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First pooled OLS, then panel regression. What do you think of my approach?

    Dear all,

    I have an unbalanced panel data set, where N (110 companies) > T (5 Years). I first conducted a pooled OLS regression (-regress-). Later, I conducted panel regressions (-xtreg-), comparing the results as robustness checks. My model is as follows:

    ROA = c.Var1_c##i.Industry Var2 Var3 Var4 Var5 i.Year, with Var1 being compensation to the CEO, Var2-5 control variables and Industry being a dummy variable (1 to 10 for different industries).

    First, I winsorized my data at (5 95) to account for any outliers. I controlled for the OLS assumptions and in consequence transformed some variables (linearity), and mean-centered my key independent variable (multicollinearity for the interaction term). As one would expect, I do have heteroscedasticity (-estat hettest-) and autocorrelation (with -gen time = _n-; -tsset time-; and -dwstat-) in my data.

    Question 1: How do I account for autocorrelation AND heteroscedasticity in pooled OLS? I understand that for the first I can use -prais ..., corc-, and for the latter -regress ...,vce(robust) -, but I have failed to find a combined method.

    See the result of my pooled OLS regression below:
    HTML Code:
    . regress ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, vce(robust)
    
    Linear regression                               Number of obs     =        472
                                                    F(28, 443)        =      34.84
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.4664
                                                    Root MSE          =     .03384
    
    -----------------------------------------------------------------------------------------
                            |               Robust
                    ROA_new |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
                     Var1_c |   .0004192   .0001773     2.37   0.018     .0000709    .0007676
                            |
               IndustryRank |
    Communication Services  |   .0129088   .0072345     1.78   0.075    -.0013093     .027127
    Consumer Discretionary  |   .0188844   .0071505     2.64   0.009     .0048314    .0329375
          Consumer Staples  |   .0078738   .0148976     0.53   0.597    -.0214049    .0371525
                Financials  |  -.0030436   .0073608    -0.41   0.679    -.0175101    .0114228
               Health Care  |   .0181798   .0069513     2.62   0.009     .0045182    .0318415
    Information Technology  |   .0339861   .0076653     4.43   0.000     .0189212     .049051
                 Materials  |   .0007409   .0056497     0.13   0.896    -.0103627    .0118444
               Real Estate  |  -.0062964   .0064746    -0.97   0.331    -.0190212    .0064284
                 Utilities  |  -.0269126   .0061301    -4.39   0.000    -.0389604   -.0148648
                            |
      IndustryRank#c.Var1_c |
    Communication Services  |   -.000053   .0002856    -0.19   0.853    -.0006143    .0005083
    Consumer Discretionary  |   .0001062   .0002586     0.41   0.682    -.0004021    .0006145
          Consumer Staples  |   .0004539   .0005145     0.88   0.378    -.0005572    .0014651
                Financials  |  -.0002145   .0001982    -1.08   0.280    -.0006041    .0001751
               Health Care  |   .0003999    .000239     1.67   0.095    -.0000699    .0008697
    Information Technology  |   .0004263    .000286     1.49   0.137    -.0001358    .0009884
                 Materials  |   .0004491   .0002949     1.52   0.129    -.0001305    .0010288
               Real Estate  |    .000397    .000238     1.67   0.096    -.0000708    .0008648
                 Utilities  |  -.0001756    .000237    -0.74   0.459    -.0006415    .0002902
                            |
                       Var2 |  -.0339362   .0150897    -2.25   0.025    -.0635926   -.0042799
                       Var3 |   .0479924   .0242645     1.98   0.049     .0003044    .0956803
                       Var4 |  -.0126286   .0016375    -7.71   0.000    -.0158468   -.0094103
                       Var5 |   .0003169   .0011124     0.28   0.776    -.0018693    .0025032
                            |
                       Year |
                      2015  |    .001048    .005721     0.18   0.855    -.0101957    .0122917
                      2016  |    .001855   .0052554     0.35   0.724    -.0084736    .0121835
                      2017  |   .0068407   .0051132     1.34   0.182    -.0032084    .0168898
                      2018  |   .0058702   .0051116     1.15   0.251    -.0041758    .0159161
                      2019  |   .0056374   .0054482     1.03   0.301    -.0050702     .016345
                            |
                      _cons |   .2496132   .0273422     9.13   0.000     .1958766    .3033498
    -----------------------------------------------------------------------------------------
    Question 2: Would you consider this an appropriate model? Am I missing something?

    For my panel regressions I used the same winsorized/transformed data.

    Question 3: Is this considered normal, or would one take the original (for some part) non-linear, non-normally-distributed data?

    I then followed to do panel regressions (-xtreg, fe/re-) and testing for autocorrelation (-xtserial ...,output-, without categorical variables/interaction term) and heteroscedasticity (-xttest3-). After having confirmed, that both exist in my panel data, I accounted for it by going -xtreg ..., re vce(cluster Company_ID)- after the Hausman Test.

    Question 4: Is using - ,re vce(cluster Company_ID)- correct in order to account for both, or should I conduct FGLS (-xtgls ..., p(h) c(ar1)-) or PCSE analyses (-xtpcse ..., het c(ar1)-)?

    HTML Code:
    . xtset Company_ID Year
           panel variable:  Company_ID (unbalanced)
            time variable:  Year, 2014 to 2019, but with gaps
                    delta:  1 unit
    
    . xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, re
    
    Random-effects GLS regression                   Number of obs     =        472
    Group variable: Company_ID                      Number of groups  =        106
    
    R-sq:                                           Obs per group:
         within  = 0.2685                                         min =          1
         between = 0.3871                                         avg =        4.5
         overall = 0.4329                                         max =          6
    
                                                    Wald chi2(28)     =     187.42
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
    -----------------------------------------------------------------------------------------
                    ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
                     Var1_c |     .00071   .0001607     4.42   0.000      .000395     .001025
                            |
               IndustryRank |
    Communication Services  |   .0097942   .0136338     0.72   0.473    -.0169275     .036516
    Consumer Discretionary  |   .0130758   .0124651     1.05   0.294    -.0113553    .0375069
          Consumer Staples  |   .0087058   .0202608     0.43   0.667    -.0310046    .0484163
                Financials  |  -.0081966   .0166936    -0.49   0.623    -.0409154    .0245223
               Health Care  |   .0134478   .0135422     0.99   0.321    -.0130943    .0399899
    Information Technology  |   .0324283   .0149377     2.17   0.030     .0031509    .0617057
                 Materials  |  -.0025767   .0128767    -0.20   0.841    -.0278146    .0226613
               Real Estate  |  -.0099463   .0232338    -0.43   0.669    -.0554838    .0355912
                 Utilities  |  -.0306672   .0206443    -1.49   0.137    -.0711293     .009795
                            |
      IndustryRank#c.Var1_c |
    Communication Services  |  -.0000123   .0003093    -0.04   0.968    -.0006185    .0005938
    Consumer Discretionary  |  -.0001672   .0002001    -0.84   0.403    -.0005595     .000225
          Consumer Staples  |  -.0006347   .0002995    -2.12   0.034    -.0012218   -.0000476
                Financials  |   -.000613   .0003204    -1.91   0.056    -.0012409     .000015
               Health Care  |  -.0004573   .0002939    -1.56   0.120    -.0010333    .0001188
    Information Technology  |  -.0002168   .0003547    -0.61   0.541     -.000912    .0004783
                 Materials  |   .0002684   .0002352     1.14   0.254    -.0001924    .0007293
               Real Estate  |   .0000822   .0006748     0.12   0.903    -.0012404    .0014048
                 Utilities  |   -.000399   .0003905    -1.02   0.307    -.0011645    .0003664
                            |
                       Var2 |  -.0373825   .0139206    -2.69   0.007    -.0646665   -.0100985
                       Var3 |   .0421203   .0217148     1.94   0.052      -.00044    .0846806
                       Var4 |  -.0106307   .0025466    -4.17   0.000    -.0156219   -.0056396
                       Var5 |  -.0007089   .0008282    -0.86   0.392    -.0023321    .0009143
                            |
                       Year |
                      2015  |   .0009376   .0024598     0.38   0.703    -.0038835    .0057588
                      2016  |   .0003204   .0024665     0.13   0.897    -.0045139    .0051546
                      2017  |   .0038309   .0024417     1.57   0.117    -.0009546    .0086165
                      2018  |   .0013017   .0025037     0.52   0.603    -.0036054    .0062088
                      2019  |  -.0014532   .0027073    -0.54   0.591    -.0067594    .0038531
                            |
                      _cons |   .2299991   .0409079     5.62   0.000      .149821    .3101771
    ------------------------+----------------------------------------------------------------
                    sigma_u |   .0357301
                    sigma_e |  .01435012
                        rho |  .86110172   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------
    
    . est store re1
    
    . xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, fe
    note: 1.IndustryRank omitted because of collinearity
    note: 2.IndustryRank omitted because of collinearity
    note: 3.IndustryRank omitted because of collinearity
    note: 4.IndustryRank omitted because of collinearity
    note: 5.IndustryRank omitted because of collinearity
    note: 7.IndustryRank omitted because of collinearity
    note: 8.IndustryRank omitted because of collinearity
    note: 9.IndustryRank omitted because of collinearity
    note: 10.IndustryRank omitted because of collinearity
    
    Fixed-effects (within) regression               Number of obs     =        472
    Group variable: Company_ID                      Number of groups  =        106
    
    R-sq:                                           Obs per group:
         within  = 0.2722                                         min =          1
         between = 0.2660                                         avg =        4.5
         overall = 0.3108                                         max =          6
    
                                                    F(19,347)         =       6.83
    corr(u_i, Xb)  = 0.0466                         Prob > F          =     0.0000
    
    -----------------------------------------------------------------------------------------
                    ROA_new |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
                     Var1_c |   .0007539     .00017     4.43   0.000     .0004195    .0010883
                            |
               IndustryRank |
    Communication Services  |          0  (omitted)
    Consumer Discretionary  |          0  (omitted)
          Consumer Staples  |          0  (omitted)
                Financials  |          0  (omitted)
               Health Care  |          0  (omitted)
    Information Technology  |          0  (omitted)
                 Materials  |          0  (omitted)
               Real Estate  |          0  (omitted)
                 Utilities  |          0  (omitted)
                            |
      IndustryRank#c.Var1_c |
    Communication Services  |   .0002298   .0003752     0.61   0.541    -.0005083    .0009678
    Consumer Discretionary  |  -.0002387   .0002111    -1.13   0.259    -.0006538    .0001765
          Consumer Staples  |  -.0007517   .0003146    -2.39   0.017    -.0013706   -.0001329
                Financials  |  -.0006464   .0003789    -1.71   0.089    -.0013916    .0000988
               Health Care  |  -.0006811   .0003347    -2.03   0.043    -.0013393   -.0000228
    Information Technology  |  -.0003157   .0004111    -0.77   0.443    -.0011243    .0004928
                 Materials  |   .0002191   .0002482     0.88   0.378    -.0002691    .0007072
               Real Estate  |   .0000297   .0007433     0.04   0.968    -.0014321    .0014916
                 Utilities  |  -.0004243   .0004024    -1.05   0.292    -.0012158    .0003672
                            |
                       Var2 |   -.048388   .0168456    -2.87   0.004    -.0815203   -.0152556
                       Var3 |   .0406569   .0238885     1.70   0.090    -.0063276    .0876414
                       Var4 |  -.0106254   .0056596    -1.88   0.061    -.0217569     .000506
                       Var5 |   -.000814   .0008762    -0.93   0.354    -.0025374    .0009094
                            |
                       Year |
                      2015  |   .0008862   .0024982     0.35   0.723    -.0040273    .0057997
                      2016  |   .0002122   .0025832     0.08   0.935    -.0048685     .005293
                      2017  |   .0035253   .0025814     1.37   0.173    -.0015518    .0086025
                      2018  |   .0008915   .0028079     0.32   0.751    -.0046312    .0064143
                      2019  |  -.0017244   .0031846    -0.54   0.589    -.0079881    .0045392
                            |
                      _cons |   .2380737   .0926853     2.57   0.011      .055778    .4203694
    ------------------------+----------------------------------------------------------------
                    sigma_u |  .03773385
                    sigma_e |  .01435012
                        rho |  .87364723   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------
    F test that all u_i=0: F(105, 347) = 22.42                   Prob > F = 0.0000
    
    . est store fe1
    
    . xttest3
    
    Modified Wald test for groupwise heteroskedasticity
    in fixed effect regression model
    
    H0: sigma(i)^2 = sigma^2 for all i
    
    chi2 (106)  =   5.2e+31
    Prob>chi2 =      0.0000
    
    
    . xtserial ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, output
    factor-variable and time-series operators not allowed
    r(101);
    
    . xtserial ROA_new Var1_c Var2 Var3 Var4 Var5, output
    
    Linear regression                               Number of obs     =        364
                                                    F(5, 95)          =      12.17
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.2013
                                                    Root MSE          =     .01611
    
                                (Std. Err. adjusted for 96 clusters in Company_ID)
    ------------------------------------------------------------------------------
                 |               Robust
       D.ROA_new |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          Var1_c |
             D1. |   .0004306    .000123     3.50   0.001     .0001865    .0006747
                 |
            Var2 |
             D1. |  -.0765561   .0178377    -4.29   0.000    -.1119683   -.0411439
                 |
            Var3 |
             D1. |   .0629007   .0284957     2.21   0.030     .0063295    .1194718
                 |
            Var4 |
             D1. |  -.0152056   .0053296    -2.85   0.005    -.0257861   -.0046251
                 |
            Var5 |
             D1. |  -.0006843   .0006922    -0.99   0.325    -.0020586      .00069
    ------------------------------------------------------------------------------
    
    Wooldridge test for autocorrelation in panel data
    H0: no first-order autocorrelation
        F(  1,      90) =     11.909
               Prob > F =      0.0009
    
    . hausman fe1 re1
    
                     ---- Coefficients ----
                 |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                 |      fe1          re1         Difference          S.E.
    -------------+----------------------------------------------------------------
          Var1_c |    .0007539       .00071        .0000439        .0000555
    IndustryRank#|
        c.Var1_c |
              1  |    .0002298    -.0000123        .0002421        .0002125
              2  |   -.0002387    -.0001672       -.0000715        .0000671
              3  |   -.0007517    -.0006347        -.000117        .0000963
              4  |   -.0006464     -.000613       -.0000334        .0002023
              5  |   -.0006811    -.0004573       -.0002238        .0001601
              7  |   -.0003157    -.0002168       -.0000989        .0002079
              8  |    .0002191     .0002684       -.0000494        .0000795
              9  |    .0000297     .0000822       -.0000525        .0003116
             10  |   -.0004243     -.000399       -.0000253        .0000971
            Var2 |    -.048388    -.0373825       -.0110055        .0094863
            Var3 |    .0406569     .0421203       -.0014635        .0099562
            Var4 |   -.0106254    -.0106307        5.29e-06        .0050543
            Var5 |    -.000814    -.0007089       -.0001051        .0002861
            Year |
           2015  |    .0008862     .0009376       -.0000515        .0004362
           2016  |    .0002122     .0003204       -.0001081        .0007677
           2017  |    .0035253     .0038309       -.0003056        .0008378
           2018  |    .0008915     .0013017       -.0004102        .0012713
           2019  |   -.0017244    -.0014532       -.0002713         .001677
    ------------------------------------------------------------------------------
                               b = consistent under Ho and Ha; obtained from xtreg
                B = inconsistent under Ha, efficient under Ho; obtained from xtreg
    
        Test:  Ho:  difference in coefficients not systematic
    
                     chi2(19) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                              =       12.71
                    Prob>chi2 =      0.8533
    
    . xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, re vce(cluster Company_ID)
    
    Random-effects GLS regression                   Number of obs     =        472
    Group variable: Company_ID                      Number of groups  =        106
    
    R-sq:                                           Obs per group:
         within  = 0.2685                                         min =          1
         between = 0.3871                                         avg =        4.5
         overall = 0.4329                                         max =          6
    
                                                    Wald chi2(28)     =     313.61
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
                                          (Std. Err. adjusted for 106 clusters in Company_ID)
    -----------------------------------------------------------------------------------------
                            |               Robust
                    ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
                     Var1_c |     .00071   .0002914     2.44   0.015     .0001389     .001281
                            |
               IndustryRank |
    Communication Services  |   .0097942   .0152693     0.64   0.521     -.020133    .0397215
    Consumer Discretionary  |   .0130758   .0124774     1.05   0.295    -.0113794    .0375311
          Consumer Staples  |   .0087058   .0248933     0.35   0.727    -.0400842    .0574959
                Financials  |  -.0081966   .0115438    -0.71   0.478     -.030822    .0144289
               Health Care  |   .0134478   .0150852     0.89   0.373    -.0161187    .0430143
    Information Technology  |   .0324283   .0149935     2.16   0.031     .0030415     .061815
                 Materials  |  -.0025767   .0120635    -0.21   0.831    -.0262206    .0210673
               Real Estate  |  -.0099463   .0086922    -1.14   0.253    -.0269827    .0070901
                 Utilities  |  -.0306672    .012636    -2.43   0.015    -.0554333    -.005901
                            |
      IndustryRank#c.Var1_c |
    Communication Services  |  -.0000123   .0005877    -0.02   0.983    -.0011642    .0011395
    Consumer Discretionary  |  -.0001672   .0004009    -0.42   0.677    -.0009529    .0006185
          Consumer Staples  |  -.0006347   .0003247    -1.95   0.051    -.0012711    1.69e-06
                Financials  |   -.000613   .0003028    -2.02   0.043    -.0012065   -.0000194
               Health Care  |  -.0004573   .0003238    -1.41   0.158    -.0010919    .0001773
    Information Technology  |  -.0002168   .0003364    -0.64   0.519    -.0008762    .0004425
                 Materials  |   .0002684   .0004278     0.63   0.530      -.00057    .0011069
               Real Estate  |   .0000822   .0003729     0.22   0.826    -.0006487    .0008131
                 Utilities  |   -.000399   .0003621    -1.10   0.270    -.0011087    .0003107
                            |
                       Var2 |  -.0373825   .0152058    -2.46   0.014    -.0671854   -.0075796
                       Var3 |   .0421203   .0229078     1.84   0.066    -.0027781    .0870188
                       Var4 |  -.0106307   .0026749    -3.97   0.000    -.0158735    -.005388
                       Var5 |  -.0007089   .0009975    -0.71   0.477    -.0026639    .0012461
                            |
                       Year |
                      2015  |   .0009376   .0021478     0.44   0.662     -.003272    .0051473
                      2016  |   .0003204   .0021163     0.15   0.880    -.0038274    .0044682
                      2017  |   .0038309   .0028211     1.36   0.174    -.0016983    .0093602
                      2018  |   .0013017   .0031098     0.42   0.676    -.0047934    .0073969
                      2019  |  -.0014532   .0034477    -0.42   0.673    -.0082105    .0053041
                            |
                      _cons |   .2299991   .0442903     5.19   0.000     .1431917    .3168064
    ------------------------+----------------------------------------------------------------
                    sigma_u |   .0357301
                    sigma_e |  .01435012
                        rho |  .86110172   (fraction of variance due to u_i)
    -----------------------------------------------------------------------------------------
    Question 5: Would you consider this an appropriate approach? Am I missing something?

    Following that Var1_c in pooled OLS and random effects is similar in significance and having the same sign, I would conclude that the results obtained from pooled OLS seem reasonable and accept/reject my hypothesis from there.

    Question 6: Would this be a correct way to do this?

    Thank you very much for bearing with me so long. I am looking forward to your answers.

    Best regards,
    Pietro
    Last edited by Pietro Russo; 29 Jul 2021, 14:06.

  • #2
    Pietro:
    welcome to this forum.
    1), 2) and 6) if there's evidence of a panel-wise effect, going pooled OLS is a sub-optimal approach. In addition, when we talk about whatever notion of "robustness" we should clearly define it (robustness vs. heteroskedasticity? autocorrelation? model misspecification? else?). That said, if, in any OLS you detect both heteroskedastcity and autocorrelation, you should go -vce (cluster clusterid). As per your description, it seems that you were looking for a sort of sensitivity analysis (that I would not sponsor, though);
    3) ruling out the trivial scenario of a mistaken data entry, outliers are simply a fact of life. For instance, in health economics (the research field I pretend to be expert about), total cost of a given health care programmes do follow a gamma distribution, which is positively skewed, with a long right tail, as some patients need longer than average therapies and/or may experience adverse events that are expensive to manage. That said, by ruling out the so called outliers, you're actually making up your original dataset and nobody can tell you the direction and the magnitude of the bias that you impose in your analysis. In addition, normality is a weak requirement for residual distribution only (and oftentimes an oversold one).
    4) if, as it seems from your description, you have a N>T panel dataset, you should go -xtreg- with robust or clustered standard errors if you detect heteroskedasticity and/or serial correlation. Please not that, unlike -regression-, both options do the very same job under -xtreg-;
    5) it is not correct to go -hausman- with default standard errors and then invoke non.default standard errors after the -hausman- outcome. Just impose cluster-robust standard errors as soon as you detect heteroskedasticity and/or autocorrelation and then test -fe- vs -re- specification via the community-contributed module -xtoverid- (just type -search xtoverid- from within Stata to spot and install it). Being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. The usual fix is to prefix your -xtreg- code with -xi:-.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Dear Carlo,

      thank you very much for your quick and comprehensive answer.

      I am examining the impact of compensation on company performance (e.g. ROA), so I'm not really conducting a sensitivity analysis (sorry for any misleading words from my part).

      As I have tested for heteroscedasticity and autocorrelation in my OLS assumptions/tests, following your suggestions, I then immediately start with -xtreg ..., fe/re vce(robust)-, wanting to test with -xtoverid- whether to choose random effects or fixed effects (-xtset Company_ID Year-).

      After running
      Code:
      xi: xtreg ROA c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, re vce(robust)
      I get following error code:
      HTML Code:
      . xtoverid
      1:  operator invalid
      r(198);
      After deleting some part of the equations, it seems that xtoverid cannot handle interaction terms. Is there an alternative I could use or a way around this problem?

      Thank you,
      Pietro

      Comment


      • #4
        Pietro:
        I was probably unclear in my previous reply: I was under the impression that you ran -xtreg- to assess potential differences with the pooled OLS that I assumed to be your baseline analysis (that's why I thought of a sensitivity analysis).
        That said, as the community-contributed command -xtoverid- does not support -fvvarlist- notation, it does not support interactions either.
        The usual fix is creating them by hand and re-run -xtreg,re- and -xtoverid-.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Dear Carlo,

          as my interaction term contains a factor variable, I cannot simply go
          HTML Code:
          gen interaction = Var1_c*IndustryRank
          xi: xtreg ROA_new c.Var1_c ib6.IndustryRank interaction Var2 Var3 Var4 Var5 i.Year, re vce(robust)
          as R^2 within diminishes from 0.2936 to 0.2523. Question 1: Is that correct?

          When I go
          HTML Code:
          encode Industry, gen(qualityrank)
          tab Industry, gen(qualityrank_separated)
          gen in1 = Var1_c*qualityrank_separated1
          ...
          gen in10 = Var1_c*qualityrank_separated10
          xi: xtreg ROA_new c.Var1_c ib6.IndustryRank in1 in2 in3 in4 in5 in6 in7 in8 in9 in10 Var2 Var3 Var4 Var5 i.Year, re vce(robust)
          Doing so, R^2 within stays at 0.2936.

          But again, I get following error message
          HTML Code:
          . xtoverid
          1:  operator invalid
          r(198);
          Question 2: What am I doing wrong here?
          Last edited by Pietro Russo; 30 Jul 2021, 05:34.

          Comment


          • #6
            Pietro:
            you did nothing wrong; it's -xtoverid- that being glorious but a bit old-fashioned, does not support -fvvarlist- notation (that, in turn, has a role in creating categorical variables and interactions).
            Therefore, you have to create the interaction by hand, re-run -xtreg- and then -xtoverid-.
            As an aside, the R-sq to monitor after -xtreg,re-. is the -between- one.
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              Dear Carlo,

              sorry if my previous inquery was not clear.

              I did try to create these interaction terms by hand, but have failed to correctly do so.

              1. After running -xi: xtreg interaction..., re vce(robust)-, the R^2 between diminished, indicating that the interaction term I manually constructed was wrong (see code 1 in #5).

              2. After running -xi: xtreg in1...in10..., re vce(robust)-, the R^2 between stayed the same, but -xtoverid- produced the error code -r(198)-, indicating that the interaction term I manually constructed was wrong as well (see code 2 & 3 in #5).

              Now I wanted to ask, if you (or somebody else from this forum) know a trick on how to correctly manually construct an interaction term c.Continous##i.Factor (where i. Factor = (1, ..., 10)?

              Comment


              • #8
                Pietro:
                you may want to try something alongvthe following toy-example (that is, the way things were managed before the availability of -fvvarlist- notation):
                Code:
                use "https://www.stata-press.com/data/r16/nlswork.dta"
                gen interaction=race*age
                . xi: xtreg ln_wage i.race age interaction , re vce(cluster idcode)
                i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)
                
                Random-effects GLS regression                   Number of obs     =     28,510
                Group variable: idcode                          Number of groups  =      4,710
                
                R-sq:                                           Obs per group:
                     within  = 0.1027                                         min =          1
                     between = 0.1031                                         avg =        6.1
                     overall = 0.0943                                         max =         15
                
                                                                Wald chi2(4)      =    1195.42
                corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                
                                             (Std. Err. adjusted for 4,710 clusters in idcode)
                ------------------------------------------------------------------------------
                             |               Robust
                     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                    _Irace_2 |  -.1522105   .0322721    -4.72   0.000    -.2154626   -.0889583
                    _Irace_3 |   .0345965   .0826917     0.42   0.676    -.1274762    .1966693
                         age |   .0171287   .0016264    10.53   0.000      .013941    .0203164
                 interaction |   .0010913   .0011492     0.95   0.342    -.0011611    .0033437
                       _cons |   1.163334   .0190422    61.09   0.000     1.126012    1.200656
                -------------+----------------------------------------------------------------
                     sigma_u |  .36586102
                     sigma_e |  .30347941
                         rho |  .59239607   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                
                . xtoverid
                
                Test of overidentifying restrictions: fixed vs random effects
                Cross-section time-series model: xtreg re  robust cluster(idcode)
                Sargan-Hansen statistic  12.970  Chi-sq(2)    P-value = 0.0015
                
                .
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  Dear Carlo,

                  I first ran the "normal" command using the interaction term:
                  HTML Code:
                  . xi: xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2_new Var3_new Var4_new Var5_new i.Year, re vce(cluster Company_ID)
                  i.Year            _IYear_2014-2019    (naturally coded; _IYear_2014 omitted)
                  
                  Random-effects GLS regression                   Number of obs     =        472
                  Group variable: Company_ID                      Number of groups  =        106
                  
                  R-sq:                                           Obs per group:
                       within  = 0.2681                                         min =          1
                       between = 0.3918                                         avg =        4.5
                       overall = 0.4378                                         max =          6
                  
                                                                  Wald chi2(28)     =     372.71
                  corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                  
                                                        (Std. Err. adjusted for 106 clusters in Company_ID)
                  -----------------------------------------------------------------------------------------
                                          |               Robust
                                  ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  ------------------------+----------------------------------------------------------------
                                   Var1_c |   .0007248   .0002898     2.50   0.012     .0001567    .0012929
                                          |
                             IndustryRank |
                  Communication Services  |   .0086102   .0152021     0.57   0.571    -.0211853    .0384057
                  Consumer Discretionary  |   .0137158   .0125981     1.09   0.276    -.0109759    .0384076
                        Consumer Staples  |   .0079909   .0248992     0.32   0.748    -.0408107    .0567924
                              Financials  |  -.0161127   .0107229    -1.50   0.133    -.0371292    .0049037
                             Health Care  |   .0138668   .0150024     0.92   0.355    -.0155374    .0432709
                  Information Technology  |   .0308543   .0153484     2.01   0.044      .000772    .0609366
                               Materials  |  -.0021546   .0120363    -0.18   0.858    -.0257453    .0214362
                             Real Estate  |  -.0098862   .0090362    -1.09   0.274    -.0275968    .0078245
                               Utilities  |  -.0302404   .0128507    -2.35   0.019    -.0554274   -.0050534
                                          |
                    IndustryRank#c.Var1_c |
                  Communication Services  |  -.0000116   .0005819    -0.02   0.984     -.001152    .0011288
                  Consumer Discretionary  |  -.0001697   .0003958    -0.43   0.668    -.0009456    .0006061
                        Consumer Staples  |  -.0006245   .0003207    -1.95   0.051     -.001253    3.95e-06
                              Financials  |  -.0006271   .0002985    -2.10   0.036    -.0012122   -.0000421
                             Health Care  |  -.0004604    .000324    -1.42   0.155    -.0010955    .0001747
                  Information Technology  |  -.0002254   .0003347    -0.67   0.501    -.0008813    .0004306
                               Materials  |   .0002611   .0004271     0.61   0.541    -.0005759    .0010981
                             Real Estate  |   .0000992   .0003681     0.27   0.788    -.0006222    .0008206
                               Utilities  |  -.0003898   .0003645    -1.07   0.285    -.0011042    .0003245
                                          |
                                 Var2_new |  -.0423081    .017189    -2.46   0.014     -.075998   -.0086182
                                 Var3_new |   .0275752   .0222929     1.24   0.216     -.016118    .0712684
                                 Var4_new |  -.0113187   .0026288    -4.31   0.000     -.016471   -.0061664
                                 Var5_new |  -.0009773   .0010466    -0.93   0.350    -.0030286     .001074
                              _IYear_2015 |   .0007947   .0021468     0.37   0.711    -.0034129    .0050023
                              _IYear_2016 |   .0002901   .0020943     0.14   0.890    -.0038146    .0043948
                              _IYear_2017 |   .0039183   .0027886     1.41   0.160    -.0015473     .009384
                              _IYear_2018 |   .0013146   .0030874     0.43   0.670    -.0047366    .0073657
                              _IYear_2019 |  -.0014263   .0033975    -0.42   0.675    -.0080853    .0052327
                                    _cons |   .2485472   .0438153     5.67   0.000     .1626709    .3344236
                  ------------------------+----------------------------------------------------------------
                                  sigma_u |  .03537235
                                  sigma_e |  .01435376
                                      rho |  .85861535   (fraction of variance due to u_i)
                  -----------------------------------------------------------------------------------------
                  I then generated the interaction manually according to your example. After that, I ran the -xtreg, re- again and did the -xtoverid-:
                  HTML Code:
                  . gen interaction = Var1_c*IndustryRank
                  . xi: xtreg ROA_new c.Var1_c i.IndustryRank interaction Var2_new Var3_new Var4_new Var5_new i.Year, re vce(cluster Company_ID)
                  i.IndustryRank    _IIndustryR_1-10    (naturally coded; _IIndustryR_1 omitted)
                  i.Year            _IYear_2014-2019    (naturally coded; _IYear_2014 omitted)
                  
                  Random-effects GLS regression                   Number of obs     =        472
                  Group variable: Company_ID                      Number of groups  =        106
                  
                  R-sq:                                           Obs per group:
                       within  = 0.2364                                         min =          1
                       between = 0.4012                                         avg =        4.5
                       overall = 0.4499                                         max =          6
                  
                                                                  Wald chi2(20)     =     190.28
                  corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                  
                                               (Std. Err. adjusted for 106 clusters in Company_ID)
                  --------------------------------------------------------------------------------
                                 |               Robust
                         ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  ---------------+----------------------------------------------------------------
                          Var1_c |   .0004021   .0002776     1.45   0.147     -.000142    .0009461
                   _IIndustryR_2 |   .0072804   .0153712     0.47   0.636    -.0228467    .0374074
                   _IIndustryR_3 |  -.0046551   .0252959    -0.18   0.854    -.0542342     .044924
                   _IIndustryR_4 |  -.0202826   .0135727    -1.49   0.135    -.0468847    .0063194
                   _IIndustryR_5 |   .0098822   .0171969     0.57   0.566     -.023823    .0435875
                   _IIndustryR_6 |  -.0080023   .0141833    -0.56   0.573     -.035801    .0197964
                   _IIndustryR_7 |    .023971   .0179354     1.34   0.181    -.0111816    .0591237
                   _IIndustryR_8 |  -.0088335   .0151883    -0.58   0.561     -.038602     .020935
                   _IIndustryR_9 |  -.0190164   .0143501    -1.33   0.185    -.0471421    .0091093
                  _IIndustryR_10 |  -.0397027   .0158814    -2.50   0.012    -.0708296   -.0085757
                     interaction |   .0000379   .0000492     0.77   0.441    -.0000586    .0001343
                        Var2_new |  -.0436658   .0176222    -2.48   0.013    -.0782047    -.009127
                        Var3_new |    .024368   .0215767     1.13   0.259    -.0179216    .0666576
                        Var4_new |  -.0116516   .0025955    -4.49   0.000    -.0167387   -.0065645
                        Var5_new |  -.0009422   .0010592    -0.89   0.374    -.0030183    .0011338
                     _IYear_2015 |   .0005011   .0021292     0.24   0.814     -.003672    .0046742
                     _IYear_2016 |   -.000304   .0020983    -0.14   0.885    -.0044167    .0038086
                     _IYear_2017 |   .0037341   .0028087     1.33   0.184    -.0017709    .0092392
                     _IYear_2018 |   .0008108   .0031412     0.26   0.796    -.0053459    .0069675
                     _IYear_2019 |  -.0024814   .0033101    -0.75   0.453    -.0089692    .0040063
                           _cons |   .2628982    .043474     6.05   0.000     .1776906    .3481057
                  ---------------+----------------------------------------------------------------
                         sigma_u |  .03410628
                         sigma_e |  .01450921
                             rho |  .84675796   (fraction of variance due to u_i)
                  --------------------------------------------------------------------------------
                  
                  . xtoverid
                  
                  Test of overidentifying restrictions: fixed vs random effects
                  Cross-section time-series model: xtreg re  robust cluster(Company_ID)
                  Sargan-Hansen statistic  29.803  Chi-sq(11)   P-value = 0.0017
                  This result would indicate to use the fixed-effects model -xtreg, fe vce(cluster Company_ID).

                  The question I have is: You can see, that all 3 R^2 changed and with them all the coefficients. This points toward the manually constructed interaction term not being correctly done. What do you think? Would I (even with this problem) still accept the result and use fixed-effects?

                  Thank you,
                  Pietro

                  Comment


                  • #10
                    Pietro:
                    a quick comparison between the two models highlights that:
                    1) between R-sq overlap: 0.3918 vs. 0.4012 (this is not an issue);
                    2) the first model reports 5 (vs 2 in the latter) coefficients to reach statistical significance (this is not an interesting difference, indeed).
                    That said, it would seem that the interaction has no role here and you may want to consider a more parsimonious specification.
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Hi Carlo,

                      thank you very much for your great help!

                      Do you, by chance, have a citation for the -xtoverid- command? Following -help xtoverid-, I only get the following reference:

                      Schaffer, M.E., Stillman, S. 2010. xtoverid: Stata module to calculate tests of overidentifying restrictions after xtreg, xtivreg, xtivreg2 and xthtaylor http://ideas.repec.org/c/boc/bocode/s456779.html

                      Best regards,
                      Pietro

                      Comment


                      • #12
                        Pietro:
                        the link https://ideas.repec.org/c/boc/bocode/s456779.html will give you the suggested citation of the community-contributed module -xtioverid-.
                        You can safely quote it in your paper/research report/whatever.
                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment

                        Working...
                        X