R-squared rockets when adding a categorical vairable in the Random Effect model

Cuong Hoang

Join Date: Jan 2018
Posts: 13

R-squared rockets when adding a categorical vairable in the Random Effect model

07 Feb 2019, 13:56

Dear readers of Statalist,

I have an unbalanced panel data of companies in a country and their corresponding regional/ industrial variables including population, industry employment share in the region, the number of industry (diversity), competition and two control variables (1 at company level, 1 at regional level).
The variables of interest is two first one.
I'm trying to regress company's productivity (measured as company's total factor productivity) on regional/ industrial variables.
I prepared for the regression like this:

HTML Code:

xtset    id year, yearly
    panel variable:    id (unbalanced)
    time variable:    year, 2011 to 2016,    but    with    gaps
    delta:    1 year

Because the information from the command xtsum for exlanatory variables shows that within-variation is much smaller than between-variation, and in fact, regional/ industrial variables are slowly changing over time, which makes standard error of fixed-effects (FE) estimation very high. Hence, besides FE estimation, I also conduct random-effects (RE) estimation to see the outcome from between-variation though results from the command xtoverid in favor of FE over RE. As Tom S. Clark et al. (2012) at the link "https://datajobs.com/data-science-repo/Fixed-Effects-Models-[Clark-and-Linzer].pdf" point out that, in the case of big N and small T with slow-changing explanatory variables, RE model is even better than FE model, as long as Fixed-effects factor is low correlated with explanatory variables. I believe that my case is valid for that argument.
The basic results for RE model using GLS technique with Stata 15.1 is as follows: (I removed results for year dummies for the sake of space).

HTML Code:

xtreg lnProductivity lnPopulation IndustryShare Diversity Competition Control1 Control2    i.year,    re    vce(cluster    Region_Industry)

Random-effects GLS regression                   Number of obs     =     82,557
Group variable: id                              Number of groups  =     28,722

R-sq:                                           Obs per group:
within  = 0.0507                                         min =          1
between = 0.0404                                         avg =        2.9
overall = 0.0449                                         max =          6

Wald chi2(11)     =     694.47
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

(Std. Err. adjusted for 978 clusters in Region_Industry)

Robust
lnProductiv~y       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]

lnPopulation     .1219238    .027064     4.51   0.000     .0688793    .1749683
IndustryShare     .060369   .0277869     2.17   0.030     .0059078    .1148302
Diversity        .0668951   .0376893     1.77   0.076    -.0069745    .1407647
Competition     -.0721706   .0303292    -2.38   0.017    -.1316148   -.0127265
Control1         .0002489   .0001706     1.46   0.145    -.0000855    .0005833
Control2        -.0003625   .0003653    -0.99   0.321    -.0010785    .0003536

To control industrial fixed-effect, I add the industry variable (classification of industry for companies) to the command, I yield:

HTML Code:

xtreg lnProductivity lnPopulation IndustryShare Diversity Competition Control1 Control2    i.Industry    i.year,    re    vce(cluster    Region_Industry)

Random-effects GLS regression                   Number of obs     =     82,557
Group variable: id                              Number of groups  =     28,722

R-sq:                                           Obs per group:
within  = 0.0515                                         min =          1
between = 0.5351                                         avg =        2.9
overall = 0.5430                                         max =          6

Wald chi2(69)     =   15884.90
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

(Std. Err. adjusted for 978 clusters in Region_Industry)

Robust
lnProductiv~y       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]

lnPopulation      .1224042   .0080391    15.23   0.000     .1066478    .1381606
IndustryShare    .0442533   .0140208     3.16   0.002     .0167729    .0717336
Diversity       .0323125   .0091825     3.52   0.000     .0143152    .0503098
Competition     -.0450796   .0103554    -4.35   0.000    -.0653757   -.0247835
Control1         .0003052   .0001753     1.74   0.082    -.0000384    .0006488
Control2        -.0007511   .0005478    -1.37   0.170    -.0018248    .0003225

Industry
102         .1498892   .0985636     1.52   0.128    -.0432918    .3430703
103         .1355773   .1181954     1.15   0.251    -.0960815     .367236
104         .3746264   .1751559     2.14   0.032     .0313272    .7179256
105         .4152645   .2612369     1.59   0.112    -.0967504    .9272794
106         .1885796   .1911881     0.99   0.324    -.1861422    .5633014
107         .0894018   .1089424     0.82   0.412    -.1241214    .3029251
108         .5530862   .1164927     4.75   0.000     .3247646    .7814078
110         -1.04191   .1067401    -9.76   0.000    -1.251117   -.8327036

There are about 60 industries, but I cut it down to keep the results table short.
As you can see, the overall R-squared goes up to above 0.5, much bigger than 0.05 in the previous regression.
So, I'm quite confused the reason behind this jump and wondering whether I should put industry fixed-effect into the RE model?

Thank you very much in advance for your time and advice!

Last edited by Cuong Hoang; 07 Feb 2019, 14:00.

Tags: None

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

08 Feb 2019, 16:49

I don't see a problem. Measured productivity varies massively across industries so adding industry dummies adds a lot of explanation.

One question is at what level you're doing your panel - if it was firm, then you probably wouldn't get parameter estimates on Industry. I'm used to seeing panel work that puts in the panel at the firm level.

You might look at Schunck (2013): and a later paper along with xthybrid.
1 like
Comment
Cuong Hoang

Join Date: Jan 2018

Posts: 13
#3

17 Feb 2019, 07:25

Thank you, Phil Bromiley, for your comment and advice.
I firgued out the problem, it might come from the fact that I calculate firm's productivity with elasticities for each industry, then if I use industry Fixed-effects, R-squared will go up because this FE explain a lot variations of firm's productivity within each the same industry in my model!
Comment

Announcement

R-squared rockets when adding a categorical vairable in the Random Effect model

Comment

Comment