First pooled OLS, then panel regression. What do you think of my approach?

Pietro Russo

Join Date: Jul 2021
Posts: 7

First pooled OLS, then panel regression. What do you think of my approach?

29 Jul 2021, 14:02

Dear all,

I have an unbalanced panel data set, where N (110 companies) > T (5 Years). I first conducted a pooled OLS regression (-regress-). Later, I conducted panel regressions (-xtreg-), comparing the results as robustness checks. My model is as follows:

ROA = c.Var1_c##i.Industry Var2 Var3 Var4 Var5 i.Year, with Var1 being compensation to the CEO, Var2-5 control variables and Industry being a dummy variable (1 to 10 for different industries).

First, I winsorized my data at (5 95) to account for any outliers. I controlled for the OLS assumptions and in consequence transformed some variables (linearity), and mean-centered my key independent variable (multicollinearity for the interaction term). As one would expect, I do have heteroscedasticity (-estat hettest-) and autocorrelation (with -gen time = _n-; -tsset time-; and -dwstat-) in my data.

Question 1: How do I account for autocorrelation AND heteroscedasticity in pooled OLS? I understand that for the first I can use -prais ..., corc-, and for the latter -regress ...,vce(robust) -, but I have failed to find a combined method.

See the result of my pooled OLS regression below:

HTML Code:

. regress ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, vce(robust)

Linear regression                               Number of obs     =        472
                                                F(28, 443)        =      34.84
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4664
                                                Root MSE          =     .03384

-----------------------------------------------------------------------------------------
                        |               Robust
                ROA_new |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
                 Var1_c |   .0004192   .0001773     2.37   0.018     .0000709    .0007676
                        |
           IndustryRank |
Communication Services  |   .0129088   .0072345     1.78   0.075    -.0013093     .027127
Consumer Discretionary  |   .0188844   .0071505     2.64   0.009     .0048314    .0329375
      Consumer Staples  |   .0078738   .0148976     0.53   0.597    -.0214049    .0371525
            Financials  |  -.0030436   .0073608    -0.41   0.679    -.0175101    .0114228
           Health Care  |   .0181798   .0069513     2.62   0.009     .0045182    .0318415
Information Technology  |   .0339861   .0076653     4.43   0.000     .0189212     .049051
             Materials  |   .0007409   .0056497     0.13   0.896    -.0103627    .0118444
           Real Estate  |  -.0062964   .0064746    -0.97   0.331    -.0190212    .0064284
             Utilities  |  -.0269126   .0061301    -4.39   0.000    -.0389604   -.0148648
                        |
  IndustryRank#c.Var1_c |
Communication Services  |   -.000053   .0002856    -0.19   0.853    -.0006143    .0005083
Consumer Discretionary  |   .0001062   .0002586     0.41   0.682    -.0004021    .0006145
      Consumer Staples  |   .0004539   .0005145     0.88   0.378    -.0005572    .0014651
            Financials  |  -.0002145   .0001982    -1.08   0.280    -.0006041    .0001751
           Health Care  |   .0003999    .000239     1.67   0.095    -.0000699    .0008697
Information Technology  |   .0004263    .000286     1.49   0.137    -.0001358    .0009884
             Materials  |   .0004491   .0002949     1.52   0.129    -.0001305    .0010288
           Real Estate  |    .000397    .000238     1.67   0.096    -.0000708    .0008648
             Utilities  |  -.0001756    .000237    -0.74   0.459    -.0006415    .0002902
                        |
                   Var2 |  -.0339362   .0150897    -2.25   0.025    -.0635926   -.0042799
                   Var3 |   .0479924   .0242645     1.98   0.049     .0003044    .0956803
                   Var4 |  -.0126286   .0016375    -7.71   0.000    -.0158468   -.0094103
                   Var5 |   .0003169   .0011124     0.28   0.776    -.0018693    .0025032
                        |
                   Year |
                  2015  |    .001048    .005721     0.18   0.855    -.0101957    .0122917
                  2016  |    .001855   .0052554     0.35   0.724    -.0084736    .0121835
                  2017  |   .0068407   .0051132     1.34   0.182    -.0032084    .0168898
                  2018  |   .0058702   .0051116     1.15   0.251    -.0041758    .0159161
                  2019  |   .0056374   .0054482     1.03   0.301    -.0050702     .016345
                        |
                  _cons |   .2496132   .0273422     9.13   0.000     .1958766    .3033498
-----------------------------------------------------------------------------------------

Question 2: Would you consider this an appropriate model? Am I missing something?

For my panel regressions I used the same winsorized/transformed data.

Question 3: Is this considered normal, or would one take the original (for some part) non-linear, non-normally-distributed data?

I then followed to do panel regressions (-xtreg, fe/re-) and testing for autocorrelation (-xtserial ...,output-, without categorical variables/interaction term) and heteroscedasticity (-xttest3-). After having confirmed, that both exist in my panel data, I accounted for it by going -xtreg ..., re vce(cluster Company_ID)- after the Hausman Test.

Question 4: Is using - ,re vce(cluster Company_ID)- correct in order to account for both, or should I conduct FGLS (-xtgls ..., p(h) c(ar1)-) or PCSE analyses (-xtpcse ..., het c(ar1)-)?

HTML Code:

. xtset Company_ID Year
       panel variable:  Company_ID (unbalanced)
        time variable:  Year, 2014 to 2019, but with gaps
                delta:  1 unit

. xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, re

Random-effects GLS regression                   Number of obs     =        472
Group variable: Company_ID                      Number of groups  =        106

R-sq:                                           Obs per group:
     within  = 0.2685                                         min =          1
     between = 0.3871                                         avg =        4.5
     overall = 0.4329                                         max =          6

                                                Wald chi2(28)     =     187.42
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------------
                ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
                 Var1_c |     .00071   .0001607     4.42   0.000      .000395     .001025
                        |
           IndustryRank |
Communication Services  |   .0097942   .0136338     0.72   0.473    -.0169275     .036516
Consumer Discretionary  |   .0130758   .0124651     1.05   0.294    -.0113553    .0375069
      Consumer Staples  |   .0087058   .0202608     0.43   0.667    -.0310046    .0484163
            Financials  |  -.0081966   .0166936    -0.49   0.623    -.0409154    .0245223
           Health Care  |   .0134478   .0135422     0.99   0.321    -.0130943    .0399899
Information Technology  |   .0324283   .0149377     2.17   0.030     .0031509    .0617057
             Materials  |  -.0025767   .0128767    -0.20   0.841    -.0278146    .0226613
           Real Estate  |  -.0099463   .0232338    -0.43   0.669    -.0554838    .0355912
             Utilities  |  -.0306672   .0206443    -1.49   0.137    -.0711293     .009795
                        |
  IndustryRank#c.Var1_c |
Communication Services  |  -.0000123   .0003093    -0.04   0.968    -.0006185    .0005938
Consumer Discretionary  |  -.0001672   .0002001    -0.84   0.403    -.0005595     .000225
      Consumer Staples  |  -.0006347   .0002995    -2.12   0.034    -.0012218   -.0000476
            Financials  |   -.000613   .0003204    -1.91   0.056    -.0012409     .000015
           Health Care  |  -.0004573   .0002939    -1.56   0.120    -.0010333    .0001188
Information Technology  |  -.0002168   .0003547    -0.61   0.541     -.000912    .0004783
             Materials  |   .0002684   .0002352     1.14   0.254    -.0001924    .0007293
           Real Estate  |   .0000822   .0006748     0.12   0.903    -.0012404    .0014048
             Utilities  |   -.000399   .0003905    -1.02   0.307    -.0011645    .0003664
                        |
                   Var2 |  -.0373825   .0139206    -2.69   0.007    -.0646665   -.0100985
                   Var3 |   .0421203   .0217148     1.94   0.052      -.00044    .0846806
                   Var4 |  -.0106307   .0025466    -4.17   0.000    -.0156219   -.0056396
                   Var5 |  -.0007089   .0008282    -0.86   0.392    -.0023321    .0009143
                        |
                   Year |
                  2015  |   .0009376   .0024598     0.38   0.703    -.0038835    .0057588
                  2016  |   .0003204   .0024665     0.13   0.897    -.0045139    .0051546
                  2017  |   .0038309   .0024417     1.57   0.117    -.0009546    .0086165
                  2018  |   .0013017   .0025037     0.52   0.603    -.0036054    .0062088
                  2019  |  -.0014532   .0027073    -0.54   0.591    -.0067594    .0038531
                        |
                  _cons |   .2299991   .0409079     5.62   0.000      .149821    .3101771
------------------------+----------------------------------------------------------------
                sigma_u |   .0357301
                sigma_e |  .01435012
                    rho |  .86110172   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------

. est store re1

. xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, fe
note: 1.IndustryRank omitted because of collinearity
note: 2.IndustryRank omitted because of collinearity
note: 3.IndustryRank omitted because of collinearity
note: 4.IndustryRank omitted because of collinearity
note: 5.IndustryRank omitted because of collinearity
note: 7.IndustryRank omitted because of collinearity
note: 8.IndustryRank omitted because of collinearity
note: 9.IndustryRank omitted because of collinearity
note: 10.IndustryRank omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        472
Group variable: Company_ID                      Number of groups  =        106

R-sq:                                           Obs per group:
     within  = 0.2722                                         min =          1
     between = 0.2660                                         avg =        4.5
     overall = 0.3108                                         max =          6

                                                F(19,347)         =       6.83
corr(u_i, Xb)  = 0.0466                         Prob > F          =     0.0000

-----------------------------------------------------------------------------------------
                ROA_new |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
                 Var1_c |   .0007539     .00017     4.43   0.000     .0004195    .0010883
                        |
           IndustryRank |
Communication Services  |          0  (omitted)
Consumer Discretionary  |          0  (omitted)
      Consumer Staples  |          0  (omitted)
            Financials  |          0  (omitted)
           Health Care  |          0  (omitted)
Information Technology  |          0  (omitted)
             Materials  |          0  (omitted)
           Real Estate  |          0  (omitted)
             Utilities  |          0  (omitted)
                        |
  IndustryRank#c.Var1_c |
Communication Services  |   .0002298   .0003752     0.61   0.541    -.0005083    .0009678
Consumer Discretionary  |  -.0002387   .0002111    -1.13   0.259    -.0006538    .0001765
      Consumer Staples  |  -.0007517   .0003146    -2.39   0.017    -.0013706   -.0001329
            Financials  |  -.0006464   .0003789    -1.71   0.089    -.0013916    .0000988
           Health Care  |  -.0006811   .0003347    -2.03   0.043    -.0013393   -.0000228
Information Technology  |  -.0003157   .0004111    -0.77   0.443    -.0011243    .0004928
             Materials  |   .0002191   .0002482     0.88   0.378    -.0002691    .0007072
           Real Estate  |   .0000297   .0007433     0.04   0.968    -.0014321    .0014916
             Utilities  |  -.0004243   .0004024    -1.05   0.292    -.0012158    .0003672
                        |
                   Var2 |   -.048388   .0168456    -2.87   0.004    -.0815203   -.0152556
                   Var3 |   .0406569   .0238885     1.70   0.090    -.0063276    .0876414
                   Var4 |  -.0106254   .0056596    -1.88   0.061    -.0217569     .000506
                   Var5 |   -.000814   .0008762    -0.93   0.354    -.0025374    .0009094
                        |
                   Year |
                  2015  |   .0008862   .0024982     0.35   0.723    -.0040273    .0057997
                  2016  |   .0002122   .0025832     0.08   0.935    -.0048685     .005293
                  2017  |   .0035253   .0025814     1.37   0.173    -.0015518    .0086025
                  2018  |   .0008915   .0028079     0.32   0.751    -.0046312    .0064143
                  2019  |  -.0017244   .0031846    -0.54   0.589    -.0079881    .0045392
                        |
                  _cons |   .2380737   .0926853     2.57   0.011      .055778    .4203694
------------------------+----------------------------------------------------------------
                sigma_u |  .03773385
                sigma_e |  .01435012
                    rho |  .87364723   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------
F test that all u_i=0: F(105, 347) = 22.42                   Prob > F = 0.0000

. est store fe1

. xttest3

Modified Wald test for groupwise heteroskedasticity
in fixed effect regression model

H0: sigma(i)^2 = sigma^2 for all i

chi2 (106)  =   5.2e+31
Prob>chi2 =      0.0000


. xtserial ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, output
factor-variable and time-series operators not allowed
r(101);

. xtserial ROA_new Var1_c Var2 Var3 Var4 Var5, output

Linear regression                               Number of obs     =        364
                                                F(5, 95)          =      12.17
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2013
                                                Root MSE          =     .01611

                            (Std. Err. adjusted for 96 clusters in Company_ID)
------------------------------------------------------------------------------
             |               Robust
   D.ROA_new |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      Var1_c |
         D1. |   .0004306    .000123     3.50   0.001     .0001865    .0006747
             |
        Var2 |
         D1. |  -.0765561   .0178377    -4.29   0.000    -.1119683   -.0411439
             |
        Var3 |
         D1. |   .0629007   .0284957     2.21   0.030     .0063295    .1194718
             |
        Var4 |
         D1. |  -.0152056   .0053296    -2.85   0.005    -.0257861   -.0046251
             |
        Var5 |
         D1. |  -.0006843   .0006922    -0.99   0.325    -.0020586      .00069
------------------------------------------------------------------------------

Wooldridge test for autocorrelation in panel data
H0: no first-order autocorrelation
    F(  1,      90) =     11.909
           Prob > F =      0.0009

. hausman fe1 re1

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |      fe1          re1         Difference          S.E.
-------------+----------------------------------------------------------------
      Var1_c |    .0007539       .00071        .0000439        .0000555
IndustryRank#|
    c.Var1_c |
          1  |    .0002298    -.0000123        .0002421        .0002125
          2  |   -.0002387    -.0001672       -.0000715        .0000671
          3  |   -.0007517    -.0006347        -.000117        .0000963
          4  |   -.0006464     -.000613       -.0000334        .0002023
          5  |   -.0006811    -.0004573       -.0002238        .0001601
          7  |   -.0003157    -.0002168       -.0000989        .0002079
          8  |    .0002191     .0002684       -.0000494        .0000795
          9  |    .0000297     .0000822       -.0000525        .0003116
         10  |   -.0004243     -.000399       -.0000253        .0000971
        Var2 |    -.048388    -.0373825       -.0110055        .0094863
        Var3 |    .0406569     .0421203       -.0014635        .0099562
        Var4 |   -.0106254    -.0106307        5.29e-06        .0050543
        Var5 |    -.000814    -.0007089       -.0001051        .0002861
        Year |
       2015  |    .0008862     .0009376       -.0000515        .0004362
       2016  |    .0002122     .0003204       -.0001081        .0007677
       2017  |    .0035253     .0038309       -.0003056        .0008378
       2018  |    .0008915     .0013017       -.0004102        .0012713
       2019  |   -.0017244    -.0014532       -.0002713         .001677
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                 chi2(19) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       12.71
                Prob>chi2 =      0.8533

. xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, re vce(cluster Company_ID)

Random-effects GLS regression                   Number of obs     =        472
Group variable: Company_ID                      Number of groups  =        106

R-sq:                                           Obs per group:
     within  = 0.2685                                         min =          1
     between = 0.3871                                         avg =        4.5
     overall = 0.4329                                         max =          6

                                                Wald chi2(28)     =     313.61
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                      (Std. Err. adjusted for 106 clusters in Company_ID)
-----------------------------------------------------------------------------------------
                        |               Robust
                ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
                 Var1_c |     .00071   .0002914     2.44   0.015     .0001389     .001281
                        |
           IndustryRank |
Communication Services  |   .0097942   .0152693     0.64   0.521     -.020133    .0397215
Consumer Discretionary  |   .0130758   .0124774     1.05   0.295    -.0113794    .0375311
      Consumer Staples  |   .0087058   .0248933     0.35   0.727    -.0400842    .0574959
            Financials  |  -.0081966   .0115438    -0.71   0.478     -.030822    .0144289
           Health Care  |   .0134478   .0150852     0.89   0.373    -.0161187    .0430143
Information Technology  |   .0324283   .0149935     2.16   0.031     .0030415     .061815
             Materials  |  -.0025767   .0120635    -0.21   0.831    -.0262206    .0210673
           Real Estate  |  -.0099463   .0086922    -1.14   0.253    -.0269827    .0070901
             Utilities  |  -.0306672    .012636    -2.43   0.015    -.0554333    -.005901
                        |
  IndustryRank#c.Var1_c |
Communication Services  |  -.0000123   .0005877    -0.02   0.983    -.0011642    .0011395
Consumer Discretionary  |  -.0001672   .0004009    -0.42   0.677    -.0009529    .0006185
      Consumer Staples  |  -.0006347   .0003247    -1.95   0.051    -.0012711    1.69e-06
            Financials  |   -.000613   .0003028    -2.02   0.043    -.0012065   -.0000194
           Health Care  |  -.0004573   .0003238    -1.41   0.158    -.0010919    .0001773
Information Technology  |  -.0002168   .0003364    -0.64   0.519    -.0008762    .0004425
             Materials  |   .0002684   .0004278     0.63   0.530      -.00057    .0011069
           Real Estate  |   .0000822   .0003729     0.22   0.826    -.0006487    .0008131
             Utilities  |   -.000399   .0003621    -1.10   0.270    -.0011087    .0003107
                        |
                   Var2 |  -.0373825   .0152058    -2.46   0.014    -.0671854   -.0075796
                   Var3 |   .0421203   .0229078     1.84   0.066    -.0027781    .0870188
                   Var4 |  -.0106307   .0026749    -3.97   0.000    -.0158735    -.005388
                   Var5 |  -.0007089   .0009975    -0.71   0.477    -.0026639    .0012461
                        |
                   Year |
                  2015  |   .0009376   .0021478     0.44   0.662     -.003272    .0051473
                  2016  |   .0003204   .0021163     0.15   0.880    -.0038274    .0044682
                  2017  |   .0038309   .0028211     1.36   0.174    -.0016983    .0093602
                  2018  |   .0013017   .0031098     0.42   0.676    -.0047934    .0073969
                  2019  |  -.0014532   .0034477    -0.42   0.673    -.0082105    .0053041
                        |
                  _cons |   .2299991   .0442903     5.19   0.000     .1431917    .3168064
------------------------+----------------------------------------------------------------
                sigma_u |   .0357301
                sigma_e |  .01435012
                    rho |  .86110172   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------

Question 5: Would you consider this an appropriate approach? Am I missing something?

Following that Var1_c in pooled OLS and random effects is similar in significance and having the same sign, I would conclude that the results obtained from pooled OLS seem reasonable and accept/reject my hypothesis from there.

Question 6: Would this be a correct way to do this?

Thank you very much for bearing with me so long. I am looking forward to your answers.

Best regards,
Pietro

Last edited by Pietro Russo; 29 Jul 2021, 14:06.

Tags: categorical, fixed effects, interaction, panel data, regression

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#2

30 Jul 2021, 01:14

Pietro:
welcome to this forum.
1), 2) and 6) if there's evidence of a panel-wise effect, going pooled OLS is a sub-optimal approach. In addition, when we talk about whatever notion of "robustness" we should clearly define it (robustness vs. heteroskedasticity? autocorrelation? model misspecification? else?). That said, if, in any OLS you detect both heteroskedastcity and autocorrelation, you should go -vce (cluster clusterid). As per your description, it seems that you were looking for a sort of sensitivity analysis (that I would not sponsor, though);
3) ruling out the trivial scenario of a mistaken data entry, outliers are simply a fact of life. For instance, in health economics (the research field I pretend to be expert about), total cost of a given health care programmes do follow a gamma distribution, which is positively skewed, with a long right tail, as some patients need longer than average therapies and/or may experience adverse events that are expensive to manage. That said, by ruling out the so called outliers, you're actually making up your original dataset and nobody can tell you the direction and the magnitude of the bias that you impose in your analysis. In addition, normality is a weak requirement for residual distribution only (and oftentimes an oversold one).
4) if, as it seems from your description, you have a N>T panel dataset, you should go -xtreg- with robust or clustered standard errors if you detect heteroskedasticity and/or serial correlation. Please not that, unlike -regression-, both options do the very same job under -xtreg-;
5) it is not correct to go -hausman- with default standard errors and then invoke non.default standard errors after the -hausman- outcome. Just impose cluster-robust standard errors as soon as you detect heteroskedasticity and/or autocorrelation and then test -fe- vs -re- specification via the community-contributed module -xtoverid- (just type -search xtoverid- from within Stata to spot and install it). Being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. The usual fix is to prefix your -xtreg- code with -xi:-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Pietro Russo

Join Date: Jul 2021

Posts: 7
#3

30 Jul 2021, 03:29

Dear Carlo,

thank you very much for your quick and comprehensive answer.

I am examining the impact of compensation on company performance (e.g. ROA), so I'm not really conducting a sensitivity analysis (sorry for any misleading words from my part).

As I have tested for heteroscedasticity and autocorrelation in my OLS assumptions/tests, following your suggestions, I then immediately start with -xtreg ..., fe/re vce(robust)-, wanting to test with -xtoverid- whether to choose random effects or fixed effects (-xtset Company_ID Year-).

After running

Code:

xi: xtreg ROA c.Var1_c##ib6.IndustryRank Var2 Var3 Var4 Var5 i.Year, re vce(robust)

I get following error code:

HTML Code:

. xtoverid 1: operator invalid r(198);

After deleting some part of the equations, it seems that xtoverid cannot handle interaction terms. Is there an alternative I could use or a way around this problem?

Thank you,
Pietro
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#4

30 Jul 2021, 03:42

Pietro:
I was probably unclear in my previous reply: I was under the impression that you ran -xtreg- to assess potential differences with the pooled OLS that I assumed to be your baseline analysis (that's why I thought of a sensitivity analysis).
That said, as the community-contributed command -xtoverid- does not support -fvvarlist- notation, it does not support interactions either.
The usual fix is creating them by hand and re-run -xtreg,re- and -xtoverid-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Pietro Russo

Join Date: Jul 2021
Posts: 7

30 Jul 2021, 05:19

Dear Carlo,

as my interaction term contains a factor variable, I cannot simply go

HTML Code:

gen interaction = Var1_c*IndustryRank
xi: xtreg ROA_new c.Var1_c ib6.IndustryRank interaction Var2 Var3 Var4 Var5 i.Year, re vce(robust)

as R^2 within diminishes from 0.2936 to 0.2523. Question 1: Is that correct?

When I go

HTML Code:

encode Industry, gen(qualityrank)
tab Industry, gen(qualityrank_separated)
gen in1 = Var1_c*qualityrank_separated1
...
gen in10 = Var1_c*qualityrank_separated10
xi: xtreg ROA_new c.Var1_c ib6.IndustryRank in1 in2 in3 in4 in5 in6 in7 in8 in9 in10 Var2 Var3 Var4 Var5 i.Year, re vce(robust)

Doing so, R^2 within stays at 0.2936.

But again, I get following error message

HTML Code:

. xtoverid
1:  operator invalid
r(198);

Question 2: What am I doing wrong here?

Last edited by Pietro Russo; 30 Jul 2021, 05:34.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#6

30 Jul 2021, 05:56

Pietro:
you did nothing wrong; it's -xtoverid- that being glorious but a bit old-fashioned, does not support -fvvarlist- notation (that, in turn, has a role in creating categorical variables and interactions).
Therefore, you have to create the interaction by hand, re-run -xtreg- and then -xtoverid-.
As an aside, the R-sq to monitor after -xtreg,re-. is the -between- one.

Kind regards,
Carlo
(Stata 19.0)
Comment
Pietro Russo

Join Date: Jul 2021

Posts: 7
#7

30 Jul 2021, 06:14

Dear Carlo,

sorry if my previous inquery was not clear.

I did try to create these interaction terms by hand, but have failed to correctly do so.

1. After running -xi: xtreg interaction..., re vce(robust)-, the R^2 between diminished, indicating that the interaction term I manually constructed was wrong (see code 1 in #5).

2. After running -xi: xtreg in1...in10..., re vce(robust)-, the R^2 between stayed the same, but -xtoverid- produced the error code -r(198)-, indicating that the interaction term I manually constructed was wrong as well (see code 2 & 3 in #5).

Now I wanted to ask, if you (or somebody else from this forum) know a trick on how to correctly manually construct an interaction term c.Continous##i.Factor (where i. Factor = (1, ..., 10)?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17851

30 Jul 2021, 06:51

Pietro:
you may want to try something alongvthe following toy-example (that is, the way things were managed before the availability of -fvvarlist- notation):

Code:

use "https://www.stata-press.com/data/r16/nlswork.dta"
gen interaction=race*age
. xi: xtreg ln_wage i.race age interaction , re vce(cluster idcode)
i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1027                                         min =          1
     between = 0.1031                                         avg =        6.1
     overall = 0.0943                                         max =         15

                                                Wald chi2(4)      =    1195.42
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Irace_2 |  -.1522105   .0322721    -4.72   0.000    -.2154626   -.0889583
    _Irace_3 |   .0345965   .0826917     0.42   0.676    -.1274762    .1966693
         age |   .0171287   .0016264    10.53   0.000      .013941    .0203164
 interaction |   .0010913   .0011492     0.95   0.342    -.0011611    .0033437
       _cons |   1.163334   .0190422    61.09   0.000     1.126012    1.200656
-------------+----------------------------------------------------------------
     sigma_u |  .36586102
     sigma_e |  .30347941
         rho |  .59239607   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(idcode)
Sargan-Hansen statistic  12.970  Chi-sq(2)    P-value = 0.0015

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Pietro Russo

Join Date: Jul 2021
Posts: 7

30 Jul 2021, 08:07

Dear Carlo,

I first ran the "normal" command using the interaction term:

HTML Code:

. xi: xtreg ROA_new c.Var1_c##ib6.IndustryRank Var2_new Var3_new Var4_new Var5_new i.Year, re vce(cluster Company_ID)
i.Year            _IYear_2014-2019    (naturally coded; _IYear_2014 omitted)

Random-effects GLS regression                   Number of obs     =        472
Group variable: Company_ID                      Number of groups  =        106

R-sq:                                           Obs per group:
     within  = 0.2681                                         min =          1
     between = 0.3918                                         avg =        4.5
     overall = 0.4378                                         max =          6

                                                Wald chi2(28)     =     372.71
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                      (Std. Err. adjusted for 106 clusters in Company_ID)
-----------------------------------------------------------------------------------------
                        |               Robust
                ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------+----------------------------------------------------------------
                 Var1_c |   .0007248   .0002898     2.50   0.012     .0001567    .0012929
                        |
           IndustryRank |
Communication Services  |   .0086102   .0152021     0.57   0.571    -.0211853    .0384057
Consumer Discretionary  |   .0137158   .0125981     1.09   0.276    -.0109759    .0384076
      Consumer Staples  |   .0079909   .0248992     0.32   0.748    -.0408107    .0567924
            Financials  |  -.0161127   .0107229    -1.50   0.133    -.0371292    .0049037
           Health Care  |   .0138668   .0150024     0.92   0.355    -.0155374    .0432709
Information Technology  |   .0308543   .0153484     2.01   0.044      .000772    .0609366
             Materials  |  -.0021546   .0120363    -0.18   0.858    -.0257453    .0214362
           Real Estate  |  -.0098862   .0090362    -1.09   0.274    -.0275968    .0078245
             Utilities  |  -.0302404   .0128507    -2.35   0.019    -.0554274   -.0050534
                        |
  IndustryRank#c.Var1_c |
Communication Services  |  -.0000116   .0005819    -0.02   0.984     -.001152    .0011288
Consumer Discretionary  |  -.0001697   .0003958    -0.43   0.668    -.0009456    .0006061
      Consumer Staples  |  -.0006245   .0003207    -1.95   0.051     -.001253    3.95e-06
            Financials  |  -.0006271   .0002985    -2.10   0.036    -.0012122   -.0000421
           Health Care  |  -.0004604    .000324    -1.42   0.155    -.0010955    .0001747
Information Technology  |  -.0002254   .0003347    -0.67   0.501    -.0008813    .0004306
             Materials  |   .0002611   .0004271     0.61   0.541    -.0005759    .0010981
           Real Estate  |   .0000992   .0003681     0.27   0.788    -.0006222    .0008206
             Utilities  |  -.0003898   .0003645    -1.07   0.285    -.0011042    .0003245
                        |
               Var2_new |  -.0423081    .017189    -2.46   0.014     -.075998   -.0086182
               Var3_new |   .0275752   .0222929     1.24   0.216     -.016118    .0712684
               Var4_new |  -.0113187   .0026288    -4.31   0.000     -.016471   -.0061664
               Var5_new |  -.0009773   .0010466    -0.93   0.350    -.0030286     .001074
            _IYear_2015 |   .0007947   .0021468     0.37   0.711    -.0034129    .0050023
            _IYear_2016 |   .0002901   .0020943     0.14   0.890    -.0038146    .0043948
            _IYear_2017 |   .0039183   .0027886     1.41   0.160    -.0015473     .009384
            _IYear_2018 |   .0013146   .0030874     0.43   0.670    -.0047366    .0073657
            _IYear_2019 |  -.0014263   .0033975    -0.42   0.675    -.0080853    .0052327
                  _cons |   .2485472   .0438153     5.67   0.000     .1626709    .3344236
------------------------+----------------------------------------------------------------
                sigma_u |  .03537235
                sigma_e |  .01435376
                    rho |  .85861535   (fraction of variance due to u_i)
-----------------------------------------------------------------------------------------

I then generated the interaction manually according to your example. After that, I ran the -xtreg, re- again and did the -xtoverid-:

HTML Code:

. gen interaction = Var1_c*IndustryRank
. xi: xtreg ROA_new c.Var1_c i.IndustryRank interaction Var2_new Var3_new Var4_new Var5_new i.Year, re vce(cluster Company_ID)
i.IndustryRank    _IIndustryR_1-10    (naturally coded; _IIndustryR_1 omitted)
i.Year            _IYear_2014-2019    (naturally coded; _IYear_2014 omitted)

Random-effects GLS regression                   Number of obs     =        472
Group variable: Company_ID                      Number of groups  =        106

R-sq:                                           Obs per group:
     within  = 0.2364                                         min =          1
     between = 0.4012                                         avg =        4.5
     overall = 0.4499                                         max =          6

                                                Wald chi2(20)     =     190.28
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                             (Std. Err. adjusted for 106 clusters in Company_ID)
--------------------------------------------------------------------------------
               |               Robust
       ROA_new |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
        Var1_c |   .0004021   .0002776     1.45   0.147     -.000142    .0009461
 _IIndustryR_2 |   .0072804   .0153712     0.47   0.636    -.0228467    .0374074
 _IIndustryR_3 |  -.0046551   .0252959    -0.18   0.854    -.0542342     .044924
 _IIndustryR_4 |  -.0202826   .0135727    -1.49   0.135    -.0468847    .0063194
 _IIndustryR_5 |   .0098822   .0171969     0.57   0.566     -.023823    .0435875
 _IIndustryR_6 |  -.0080023   .0141833    -0.56   0.573     -.035801    .0197964
 _IIndustryR_7 |    .023971   .0179354     1.34   0.181    -.0111816    .0591237
 _IIndustryR_8 |  -.0088335   .0151883    -0.58   0.561     -.038602     .020935
 _IIndustryR_9 |  -.0190164   .0143501    -1.33   0.185    -.0471421    .0091093
_IIndustryR_10 |  -.0397027   .0158814    -2.50   0.012    -.0708296   -.0085757
   interaction |   .0000379   .0000492     0.77   0.441    -.0000586    .0001343
      Var2_new |  -.0436658   .0176222    -2.48   0.013    -.0782047    -.009127
      Var3_new |    .024368   .0215767     1.13   0.259    -.0179216    .0666576
      Var4_new |  -.0116516   .0025955    -4.49   0.000    -.0167387   -.0065645
      Var5_new |  -.0009422   .0010592    -0.89   0.374    -.0030183    .0011338
   _IYear_2015 |   .0005011   .0021292     0.24   0.814     -.003672    .0046742
   _IYear_2016 |   -.000304   .0020983    -0.14   0.885    -.0044167    .0038086
   _IYear_2017 |   .0037341   .0028087     1.33   0.184    -.0017709    .0092392
   _IYear_2018 |   .0008108   .0031412     0.26   0.796    -.0053459    .0069675
   _IYear_2019 |  -.0024814   .0033101    -0.75   0.453    -.0089692    .0040063
         _cons |   .2628982    .043474     6.05   0.000     .1776906    .3481057
---------------+----------------------------------------------------------------
       sigma_u |  .03410628
       sigma_e |  .01450921
           rho |  .84675796   (fraction of variance due to u_i)
--------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  robust cluster(Company_ID)
Sargan-Hansen statistic  29.803  Chi-sq(11)   P-value = 0.0017

This result would indicate to use the fixed-effects model -xtreg, fe vce(cluster Company_ID).

The question I have is: You can see, that all 3 R^2 changed and with them all the coefficients. This points toward the manually constructed interaction term not being correctly done. What do you think? Would I (even with this problem) still accept the result and use fixed-effects?

Thank you,
Pietro

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#10

30 Jul 2021, 08:19

Pietro:
a quick comparison between the two models highlights that:
1) between R-sq overlap: 0.3918 vs. 0.4012 (this is not an issue);
2) the first model reports 5 (vs 2 in the latter) coefficients to reach statistical significance (this is not an interesting difference, indeed).
That said, it would seem that the interaction has no role here and you may want to consider a more parsimonious specification.

Kind regards,
Carlo
(Stata 19.0)
Comment
Pietro Russo

Join Date: Jul 2021

Posts: 7
#11

03 Aug 2021, 11:43

Hi Carlo,

thank you very much for your great help!

Do you, by chance, have a citation for the -xtoverid- command? Following -help xtoverid-, I only get the following reference:

Schaffer, M.E., Stillman, S. 2010. xtoverid: Stata module to calculate tests of overidentifying restrictions after xtreg, xtivreg, xtivreg2 and xthtaylor http://ideas.repec.org/c/boc/bocode/s456779.html

Best regards,
Pietro
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#12

03 Aug 2021, 11:59

Pietro:
the link https://ideas.repec.org/c/boc/bocode/s456779.html will give you the suggested citation of the community-contributed module -xtioverid-.
You can safely quote it in your paper/research report/whatever.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

First pooled OLS, then panel regression. What do you think of my approach?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment