Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pooled OLS with interaction on almost all explanatory variables

    Dear Statalist,

    I have panel data covering 763 firms over 15 years, taken from an industry consortium. I want to estimate how changes in the memberships across competing industry consortia, the number of simultaneous affiliations, the role within the focal consortium and the provision of a platform product (time-invariant) affect their product certifications. So the basic model would look like this:

    productcerts_t = beta0 + beta1 * changemem_t-1 + beta2 * simulmem_t-1 + beta3 * role_t-1 + beta4 * platform + controls


    While the model is rather straight forward, I am currently facing the issue that firms, in order to be able to certify products, are required to be members. Thus, I included a dummy variable member_t and its interaction terms with all other variables, except for role as it already requires member_t to be 1. However, that causes multicollinearity in a more complete model with all control variables and produces a large result set due to the interactions. The model then looks like this:

    productcerts_t = beta0 + beta1 * changemem_t + beta2 * simulmem_t + beta3 * role_t + beta4 * platform + beta5 * member_t + beta6 * member_t * changemem_t + beta7 * member_t * simulmem_t + beta8 * member_t * platform + controls


    I was wondering if there is a more elegant way that yields consistent results. Intuitively, I thought about filtering the observations, excluding all records where member_t == 0 and ran a pooled OLS with time dummies and clustered standard errors on id. But I am not sure if that is an appropriate approach.


    Here are some results I computed:

    1) pooled OLS with interactions and clustered standard errors
    Code:
    . reg productcerts i.member##c.L1.changemem i.member##c.L1.simulmem L1.role i.member##i.platform i.year, cluster(id)
    
    Linear regression                               Number of obs     =     10,682
                                                    F(21, 762)        =       5.34
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.0789
                                                    Root MSE          =     2.1651
    
                                                (Std. Err. adjusted for 763 clusters in id)
    ---------------------------------------------------------------------------------------
                          |               Robust
             productcerts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ----------------------+----------------------------------------------------------------
                 1.member |   .4415453   .0704454     6.27   0.000     .3032552    .5798353
                          |
                changemem |
                      L1. |   .5532723   .5693309     0.97   0.331    -.5643709    1.670915
                          |
      member#cL.changemem |
                       1  |  -1.356654   .5850375    -2.32   0.021    -2.505131   -.2081776
                          |
                 simulmem |
                      L1. |   .2238948   .1329231     1.68   0.093    -.0370441    .4848337
                          |
       member#cL.simulmem |
                       1  |   -.125708   .2387898    -0.53   0.599    -.5944721     .343056
                          |
                     role |
                      L1. |   3.496489   1.625868     2.15   0.032     .3047766    6.688201
                          |
               1.platform |   .1912549   .0916024     2.09   0.037      .011432    .3710779
                          |
          member#platform |
                     1 1  |    1.72839   .6695502     2.58   0.010     .4140079    3.042772
                          |
                     year |
                    2007  |   .0248361    .064558     0.38   0.701    -.1018965    .1515687
                    2008  |  -.0068504   .0425496    -0.16   0.872    -.0903788     .076678
                    2009  |   .0131293   .0814032     0.16   0.872    -.1466718    .1729304
                    2010  |  -.0718353   .0605614    -1.19   0.236    -.1907222    .0470516
                    2011  |   .0194899   .0722105     0.27   0.787    -.1222652     .161245
                    2012  |  -.0246552    .063305    -0.39   0.697    -.1489281    .0996177
                    2013  |   .0549405   .0778698     0.71   0.481    -.0979243    .2078054
                    2014  |  -.0239332    .068818    -0.35   0.728    -.1590286    .1111623
                    2015  |   .1155241   .1268944     0.91   0.363      -.13358    .3646283
                    2016  |   .1556162   .0833659     1.87   0.062     -.008038    .3192703
                    2017  |   .2129104   .1003894     2.12   0.034     .0158378     .409983
                    2018  |   .0882369   .0852473     1.04   0.301    -.0791104    .2555843
                    2019  |   .2275257   .1756277     1.30   0.196    -.1172458    .5722973
                          |
                    _cons |  -.0412679   .0527809    -0.78   0.435    -.1448811    .0623454
    ---------------------------------------------------------------------------------------

    2) pooled OLS model with filtered observations, excluding records where member_t == 0
    Code:
    . reg productcerts c.L1.changemem c.L1.simulmem L1.role i.platform i.year if member, cluster(id)
    
    Linear regression                               Number of obs     =      3,189
                                                    F(17, 762)        =       4.06
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.0612
                                                    Root MSE          =     3.7748
    
                                        (Std. Err. adjusted for 763 clusters in id)
    -------------------------------------------------------------------------------
                  |               Robust
     productcerts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
        changemem |
              L1. |  -.8704477    .463063    -1.88   0.061    -1.779478     .038583
                  |
         simulmem |
              L1. |   .0518086   .2144131     0.24   0.809    -.3691018    .4727191
                  |
             role |
              L1. |   3.817154   1.762396     2.17   0.031     .3574265    7.276881
                  |
       1.platform |   1.894842   .6702957     2.83   0.005     .5789965    3.210687
                  |
             year |
            2007  |   .0040985   .5051178     0.01   0.994    -.9874892    .9956862
            2008  |  -.1097982   .3155404    -0.35   0.728      -.72923    .5096335
            2009  |  -.1420047    .441294    -0.32   0.748    -1.008301    .7242916
            2010  |  -.3944345    .415614    -0.95   0.343    -1.210319      .42145
            2011  |    .057463   .4554769     0.13   0.900    -.8366756    .9516016
            2012  |  -.2025696   .4350108    -0.47   0.642    -1.056531    .6513922
            2013  |   .1460105   .4491477     0.33   0.745    -.7357033    1.027724
            2014  |  -.0861011   .4196607    -0.21   0.837    -.9099294    .7377273
            2015  |   .3009069   .4922737     0.61   0.541    -.6654667    1.267281
            2016  |   .3548497   .4243433     0.84   0.403    -.4781711     1.18787
            2017  |   .3556781   .4230937     0.84   0.401    -.4748896    1.186246
            2018  |   .1858268   .4198763     0.44   0.658    -.6384248    1.010078
            2019  |   .5825402   .5546162     1.05   0.294    -.5062169    1.671297
                  |
            _cons |   .3286396   .4078965     0.81   0.421    -.4720947    1.129374
    -------------------------------------------------------------------------------

    The second model shows slight changes in the coefficient estimates.




    Code:
    . quietly: xtreg productcerts i.member##c.L1.changemem i.member##c.L1.simulmem L1.role i.member##i.platform i.year
    . xttest0
    Breusch and Pagan Lagrangian multiplier test for random effects
    
            productcerts[id,t] = Xb + u[id] + e[id,t]
    
            Estimated results:
                             |       Var     sd = sqrt(Var)
                    ---------+-----------------------------
                   product~s |   5.079473       2.253769
                           e |     4.0121       2.003023
                           u |   .6070492       .7791336
    
            Test:   Var(u) = 0
                                 chibar2(01) =  1353.13
                              Prob > chibar2 =   0.0000

    Further, the Breusch-Pagan ML test favors a model with random effects, Hausman Test and suest cannot be run on the data/models.



    I would appreciate, If you could give me some suggestion how to succeed with this problematic. Would you recommend to stick with an RE/FE model and use the interactions? Is it legit under some assumptions to filter observations for pooled OLS? Or is there any other approach?


    Best,
    Sven




  • #2
    Sven:
    welcome to this forum.
    The higher the number of interactions in the right-hand side of your regression equation (panel or not), the (exponentiated) higher the difficulty you find in conveying the results of your research. The safest choice is to focus on the predictors that can give a true and fair view of the data generating process that you're investigating.
    That said, whenever you invoke non-default standard errors, -hausman- is not your friend for comparing -fe- vs -re-, as it supports default standard errors only.
    Hence, you should switch to the community-contributed command -xtoverid-, that, being a bit old-fashioned, does not support -fvvarlist- notation, though. The usual fix is to prefix your -xtreg- code with -xi:- and/or creating interactions by hand (like in the old days).
    I would not consider pooled OLS due to the evidence of a panelwise effect.
    In addition, filtering observations can be seen as a way of making-up data (hence, it may be difficult to defend in front of a reviewer).
    Last edited by Carlo Lazzaro; 16 Sep 2020, 10:23.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you very much. It helped me a lot!

      Best,
      Sven

      Comment

      Working...
      X