Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • R-squared rockets when adding a categorical vairable in the Random Effect model

    Dear readers of Statalist,

    I have an unbalanced panel data of companies in a country and their corresponding regional/ industrial variables including population, industry employment share in the region, the number of industry (diversity), competition and two control variables (1 at company level, 1 at regional level).
    The variables of interest is two first one.
    I'm trying to regress company's productivity (measured as company's total factor productivity) on regional/ industrial variables.
    I prepared for the regression like this:
    HTML Code:
    xtset    id year, yearly
        panel variable:    id (unbalanced)
        time variable:    year, 2011 to 2016,    but    with    gaps
        delta:    1 year
    Because the information from the command xtsum for exlanatory variables shows that within-variation is much smaller than between-variation, and in fact, regional/ industrial variables are slowly changing over time, which makes standard error of fixed-effects (FE) estimation very high. Hence, besides FE estimation, I also conduct random-effects (RE) estimation to see the outcome from between-variation though results from the command xtoverid in favor of FE over RE. As Tom S. Clark et al. (2012) at the link "https://datajobs.com/data-science-repo/Fixed-Effects-Models-[Clark-and-Linzer].pdf" point out that, in the case of big N and small T with slow-changing explanatory variables, RE model is even better than FE model, as long as Fixed-effects factor is low correlated with explanatory variables. I believe that my case is valid for that argument.
    The basic results for RE model using GLS technique with Stata 15.1 is as follows: (I removed results for year dummies for the sake of space).
    HTML Code:
    xtreg lnProductivity lnPopulation IndustryShare Diversity Competition Control1 Control2    i.year,    re    vce(cluster    Region_Industry)
    
    Random-effects GLS regression                   Number of obs     =     82,557
    Group variable: id                              Number of groups  =     28,722
    
    R-sq:                                           Obs per group:
    within  = 0.0507                                         min =          1
    between = 0.0404                                         avg =        2.9
    overall = 0.0449                                         max =          6
    
    Wald chi2(11)     =     694.47
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
    (Std. Err. adjusted for 978 clusters in Region_Industry)
    
    Robust
    lnProductiv~y       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]
    
    lnPopulation     .1219238    .027064     4.51   0.000     .0688793    .1749683
    IndustryShare     .060369   .0277869     2.17   0.030     .0059078    .1148302
    Diversity        .0668951   .0376893     1.77   0.076    -.0069745    .1407647
    Competition     -.0721706   .0303292    -2.38   0.017    -.1316148   -.0127265
    Control1         .0002489   .0001706     1.46   0.145    -.0000855    .0005833
    Control2        -.0003625   .0003653    -0.99   0.321    -.0010785    .0003536
    To control industrial fixed-effect, I add the industry variable (classification of industry for companies) to the command, I yield:
    HTML Code:
    xtreg lnProductivity lnPopulation IndustryShare Diversity Competition Control1 Control2    i.Industry    i.year,    re    vce(cluster    Region_Industry)
    
    Random-effects GLS regression                   Number of obs     =     82,557
    Group variable: id                              Number of groups  =     28,722
    
    R-sq:                                           Obs per group:
    within  = 0.0515                                         min =          1
    between = 0.5351                                         avg =        2.9
    overall = 0.5430                                         max =          6
    
    Wald chi2(69)     =   15884.90
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
    (Std. Err. adjusted for 978 clusters in Region_Industry)
    
    Robust
    lnProductiv~y       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]
    
    lnPopulation      .1224042   .0080391    15.23   0.000     .1066478    .1381606
    IndustryShare    .0442533   .0140208     3.16   0.002     .0167729    .0717336
    Diversity       .0323125   .0091825     3.52   0.000     .0143152    .0503098
    Competition     -.0450796   .0103554    -4.35   0.000    -.0653757   -.0247835
    Control1         .0003052   .0001753     1.74   0.082    -.0000384    .0006488
    Control2        -.0007511   .0005478    -1.37   0.170    -.0018248    .0003225
    
    Industry
    102         .1498892   .0985636     1.52   0.128    -.0432918    .3430703
    103         .1355773   .1181954     1.15   0.251    -.0960815     .367236
    104         .3746264   .1751559     2.14   0.032     .0313272    .7179256
    105         .4152645   .2612369     1.59   0.112    -.0967504    .9272794
    106         .1885796   .1911881     0.99   0.324    -.1861422    .5633014
    107         .0894018   .1089424     0.82   0.412    -.1241214    .3029251
    108         .5530862   .1164927     4.75   0.000     .3247646    .7814078
    110         -1.04191   .1067401    -9.76   0.000    -1.251117   -.8327036
    There are about 60 industries, but I cut it down to keep the results table short.
    As you can see, the overall R-squared goes up to above 0.5, much bigger than 0.05 in the previous regression.
    So, I'm quite confused the reason behind this jump and wondering whether I should put industry fixed-effect into the RE model?

    Thank you very much in advance for your time and advice!
    Last edited by Cuong Hoang; 07 Feb 2019, 14:00.

  • #2
    I don't see a problem. Measured productivity varies massively across industries so adding industry dummies adds a lot of explanation.

    One question is at what level you're doing your panel - if it was firm, then you probably wouldn't get parameter estimates on Industry. I'm used to seeing panel work that puts in the panel at the firm level.

    You might look at Schunck (2013): and a later paper along with xthybrid.

    Comment


    • #3
      Thank you, Phil Bromiley, for your comment and advice.
      I firgued out the problem, it might come from the fact that I calculate firm's productivity with elasticities for each industry, then if I use industry Fixed-effects, R-squared will go up because this FE explain a lot variations of firm's productivity within each the same industry in my model!

      Comment

      Working...
      X