Interaction terms in panel data

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17685
#16

18 Jul 2019, 04:59

Stephan:

Code:

reghdfe Q2ln_w WOB_w firmsize2_w lev2_w boardsizeln_w, absorb(YEAR ID)

looks good to me.

Kind regards,
Carlo
(Stata 19.0)
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#17

22 Jul 2019, 15:41

Stephan: I saw your other post about a Diff-in-Diff approach and replied in that thread. But looking at this thread, I understand your problem better. Your model from a previous post

Originally posted by Stephan Yan View Post

Code:

reghdfe TOBINSQ_w WOMENONBOARD_w c.WOMENONBOARD_w#i.Crisis c.WOMENONBOARD_w#i.PostCrisis BOARDSIZE_w FIRMSIZELNAT_w CASHHOLDINGSCHEAT_w SHORTTERMDEBT_w, absorb(DataYearFiscal#SIC_group FIRM) vce(cluster FIRM)

seems to me to be the best way to approach your problem if you are specifically interested in how the diversity-performance link changes during the crisis. This is a treatment intensity diff-in-diff with controls and fixed effects for industry-year and firm.

The reason your model from the very first post was dropping a coefficient is that your postCrisis * WoB indicator was collinear with the industry-year fixed effects. The quoted model includes only the interaction terms, so the coefficients can be computed.

Carlo brought up the point about multicollinearity in this model. The collinear variables are your women on board variables measured at different times--so they should be highly correlated! In other words, the fact that you can't separate out the effects of the variables is not too surprising--that is just saying that the effects are identified on the small number of firms with changes in their boards during the crisis period. Thus, the imprecision is a feature of the model (it is telling you that you don't have much variation to work with).

So if you really want to know how the effects of board gender diversity changed during the crisis, this model seems like the best way to get your answer. (But the main takeaway from the results is that you don't have enough precision to tell how things changed).

Now, if you aren't interested in the question of changes during the crisis, but just on how well board gender diversity predicts performance in general, then I agree with the model Carlo recommends:

Code:

reghdfe Q2ln_w WOB_w firmsize2_w lev2_w boardsizeln_w, absorb(YEAR ID)

but be aware that this is just telling you how diversity affects performance across your entire sample, and doesn't test any hypothesis about the relationship being different during the crisis.

To clarify with a point from your previous questions:

Originally posted by Stephan Yan View Post

Now i would like to ask your advice regarding the fixed effect in my model:

1) If i am using the following model (time and industry fixed effects, without firm fixed effect) :

reghdfe depvar indepvar somecontrolvariables, absorb(YEAR#SIC_group)
I get a positive and significant coefficient., however the rsquare is low, around 0.16.

2) If i am using the same model and add the firm fixed effect, i get a positive but insignificant coefficient, with a Rsquare of 0.86

reghdfe depvar indepvar somecontrolvariables, absorb(YEAR#SIC_group ID)
3) Now if i am using the same model but only with industry and firm fixed effect (without time fixed effect): I get negative and significant coefficient, with a Rsquare of 0.80

reghdfe depvar indepvar somecontrolvariables, absorb(SIC_group ID)
Knowing that the Rsquare is quite low, should i go for the 2nd or 3rd option ? And do you think it's important to include the time fixed effect knowing that there is the subcrime crisis (Crisis = 2008 & 2009), my sample consist of US firms from S&P500 from 2005-2015 (i extended from 2012 to 2015).

Option 2 here seems best to me (which is what Carlo and I effectively recommend). Option 3 is not good because it doesn't have year FE, so you are ignoring changes over time. And option 1 is not ideal because firms with different composition of their board likely differ in many ways other than their industry--so including firm fixed effects, which nets out differences between the firms that don't change over time, is a lot more convincing.
Comment

Stephan Yan

Join Date: Jul 2019
Posts: 17

#18

23 Jul 2019, 08:15

Dear Carlo, Dear Kye,

Thanks a lot for your answers.

I actually changed my sample, took a longer time period and succeed to gather more datas to do my panel data analysis. I currently have 422 firms over the 2005-2015 period (4642 observations).

I want to test following hypothesis :
1) There is a positive relationship between female board membership and firm performance
2) Firms with more women on their board of directors performed better than the firms with less women (compare top quartile vs bottom quartile)
3) Firms with more women performed better during the subprime crisis/ less impacted by the subprime crisis compare to the firms with less women

That is why i am using the following model for the 1) including time#industry and firm fixed effects:

Code:

. reghdfe lnQ2_w WOB1_w BOARDSIZELN_w FIRMSIZE2_w LEVERAGE_w , absorb(YEAR#SIC_group ID)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =      4,642
Absorbing 2 HDFE groups                           F(   4,   4166) =     468.71
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.8808
                                                  Adj R-squared   =     0.8672
                                                  Within R-sq.    =     0.3104
                                                  Root MSE        =     0.1595

-------------------------------------------------------------------------------
       lnQ2_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
       WOB1_w |   .1050496   .0466927     2.25   0.025     .0135069    .1965922
BOARDSIZELN_w |   -.164978   .0202199    -8.16   0.000    -.2046199   -.1253362
  FIRMSIZE2_w |   .3159946   .0074314    42.52   0.000      .301425    .3305642
   LEVERAGE_w |  -.0736081   .0388835    -1.89   0.058    -.1498406    .0026243
        _cons |   -2.14501   .0839331   -25.56   0.000    -2.309564   -1.980457
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
----------------------------------------------------------+
      Absorbed FE | Categories  - Redundant  = Num. Coefs |
------------------+---------------------------------------|
   YEAR#SIC_group |        55           0          55     |
               ID |       422           5         417     |
----------------------------------------------------------+

For 2) "i.WOB_quart" : is WOB1_w divide in quartile (1 is the bottom, 4 is the top quartile)

Code:

. reghdfe lnQ2_w i.WOB_quart BOARDSIZELN_w FIRMSIZE2_w LEVERAGE_w , absorb(YEAR#SIC_group ID)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =      4,642
Absorbing 2 HDFE groups                           F(   6,   4164) =     312.26
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.8808
                                                  Adj R-squared   =     0.8671
                                                  Within R-sq.    =     0.3103
                                                  Root MSE        =     0.1595

-------------------------------------------------------------------------------
       lnQ2_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    WOB_quart |
           2  |   .0135082   .0094028     1.44   0.151    -.0049262    .0319427
           3  |   .0205504   .0103757     1.98   0.048     .0002085    .0408924
           4  |   .0240279   .0117178     2.05   0.040     .0010548    .0470011
              |
BOARDSIZELN_w |  -.1642663   .0203373    -8.08   0.000    -.2041381   -.1243944
  FIRMSIZE2_w |   .3158427   .0074318    42.50   0.000     .3012724     .330413
   LEVERAGE_w |   -.074306   .0388944    -1.91   0.056    -.1505597    .0019478
        _cons |  -2.143927   .0841364   -25.48   0.000    -2.308879   -1.978974
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
----------------------------------------------------------+
      Absorbed FE | Categories  - Redundant  = Num. Coefs |
------------------+---------------------------------------|
   YEAR#SIC_group |        55           0          55     |
               ID |       422           5         417     |
----------------------------------------------------------+

Finally for 3), i tried to use the following model

. reghdfe lnQ2_w c.WOB1_w#i.Crisis c.WOB1_w#i.PostCrisis BOARDSIZELN_w FIRMSIZE2_w LEVERAGE_w ,
> absorb(YEAR#SIC_group ID)
(MWFE estimator converged in 2 iterations)
note: 1.PostCrisis#c.WOB1_w omitted because of collinearity

HDFE Linear regression Number of obs = 4,642
Absorbing 2 HDFE groups F( 6, 4164) = 314.27
Prob > F = 0.0000
R-squared = 0.8810
Adj R-squared = 0.8674
Within R-sq. = 0.3117
Root MSE = 0.1594

-------------------------------------------------------------------------------------
lnQ2_w | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
Crisis#c.WOB1_w |
0 | .15635 .052044 3.00 0.003 .0543159 .2583841
1 | .3004347 .0937002 3.21 0.001 .1167323 .4841372
|
PostCrisis#c.WOB1_w |
0 | -.17446 .062448 -2.79 0.005 -.2968914 -.0520285
1 | 0 (omitted)
|
BOARDSIZELN_w | -.1649864 .0202123 -8.16 0.000 -.2046133 -.1253596
FIRMSIZE2_w | .3154398 .0074287 42.46 0.000 .3008756 .330004
LEVERAGE_w | -.0725675 .0388583 -1.87 0.062 -.1487506 .0036155
_cons | -2.141082 .0838965 -25.52 0.000 -2.305564 -1.9766
-------------------------------------------------------------------------------------

Absorbed degrees of freedom:
----------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
------------------+---------------------------------------|
YEAR#SIC_group | 55 0 55 |
ID | 422 5 417 |
----------------------------------------------------------+

But as you said, the omitted coefficient comes from the fact that PostCrisis * WoB indicator is collinear with the industry#year fixed effects.

So if i am understanding well, it will be better to go with the following model,
by adding WOB1_w before the interactions terms?

Code:

. reghdfe lnQ2_w WOB1_w c.WOB1_w#i.Crisis c.WOB1_w#i.PostCrisis BOARDSIZELN_w FIRMSIZE2_w LEVERA
> GE_w , absorb(YEAR#SIC_group ID)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =      4,642
Absorbing 2 HDFE groups                           F(   6,   4164) =     314.27
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.8810
                                                  Adj R-squared   =     0.8674
                                                  Within R-sq.    =     0.3117
                                                  Root MSE        =     0.1594

-------------------------------------------------------------------------------------
             lnQ2_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
             WOB1_w |    -.01811   .0638666    -0.28   0.777    -.1433226    .1071027
                    |
    Crisis#c.WOB1_w |
                 1  |   .1440847   .0766019     1.88   0.060    -.0060958    .2942653
                    |
PostCrisis#c.WOB1_w |
                 1  |     .17446    .062448     2.79   0.005     .0520285    .2968914
                    |
      BOARDSIZELN_w |  -.1649864   .0202123    -8.16   0.000    -.2046133   -.1253596
        FIRMSIZE2_w |   .3154398   .0074287    42.46   0.000     .3008756     .330004
         LEVERAGE_w |  -.0725675   .0388583    -1.87   0.062    -.1487506    .0036155
              _cons |  -2.141082   .0838965   -25.52   0.000    -2.305564     -1.9766
-------------------------------------------------------------------------------------

Absorbed degrees of freedom:
----------------------------------------------------------+
      Absorbed FE | Categories  - Redundant  = Num. Coefs |
------------------+---------------------------------------|
   YEAR#SIC_group |        55           0          55     |
               ID |       422           5         417     |
----------------------------------------------------------+

However how should i interpret this results, knowing that the coefficient WOB1_w is now negative and insignificant compare to the 1) model, which indicates positive and significant?

Thanks a lot for your help!

Stephan

Comment

Kye Lippold

Join Date: Jun 2019
Posts: 67

#19

23 Jul 2019, 14:08

Regarding your specific models, the ones you list all seem appropriate to answer your questions:

Originally posted by Stephan Yan View Post

I want to test following hypothesis :
1) There is a positive relationship between female board membership and firm performance

Code:

reghdfe lnQ2_w WOB1_w BOARDSIZELN_w FIRMSIZE2_w LEVERAGE_w , absorb(YEAR#SIC_group ID)

2) Firms with more women on their board of directors performed better than the firms with less women (compare top quartile vs bottom quartile)

Code:

. reghdfe lnQ2_w i.WOB_quart BOARDSIZELN_w FIRMSIZE2_w LEVERAGE_w , absorb(YEAR#SIC_group ID)

3) Firms with more women performed better during the subprime crisis/ less impacted by the subprime crisis compare to the firms with less women

Code:

. reghdfe lnQ2_w WOB1_w c.WOB1_w#i.Crisis c.WOB1_w#i.PostCrisis BOARDSIZELN_w FIRMSIZE2_w LEVERAGE_w , absorb(YEAR#SIC_group ID)

Model 2) is just a discretized version of model 1, testing for nonlinearity in the effect of WoB on Tobin's Q.

Regarding model 3)--you asked about the difference in results compared to the model with the dropped coefficient, which I will call model 3a):

Code:

reghdfe lnQ2_w c.WOB1_w#i.Crisis c.WOB1_w#i.PostCrisis BOARDSIZELN_w FIRMSIZE2_w LEVERAGE_w ,absorb(YEAR#SIC_group ID)

I was actually mistaken about the problem being collinearity with the year fixed effects. The issue is that model 3a) is using a different reference category for the coefficients (where post-crisis is the reference). If we add up the coefficients to see the effect in each time period, we get this table:

Coefficient for WOB1_w on lnQ2_w	Model 3)	Model 3a)
2005-07	-.01811	= .15635 -.17446 = -.01811
2008-09	=-.01811+.1440847 =.1259747	=.3004347-.17446 =.1259747
2009-15	=-.01811+.17446 =.15635	.15635

so note that the two models tell you the exact same thing! (All the effects are the same). But model 3) is much preferred to 3a), because it is easier to figure out the effects of interest when the reference category is the pre period; for model 3a), you have to do more difficult math to find the effects (for example, to find the crisis period effect, subtracting the 0.PostCrisis#c.WOB from 1.Crisis#c.WOB)

Increasing your sample size by a factor of 2 has helped you a lot here--note that you get many more significant results than before.

To interpret the results: Model 1) tells you that female board membership predicts higher firm performance over your entire sample period (2005-2015). Model 3) tells you how that relationship changes over time--there is no significant relationship between female membership and firm performance in the pre-crisis period, but there is a positive and significant relationship during and after the crisis.

I think this is a really interesting result, but I would recommend that you try to gather more data for years before 2005. Only 3 years before the crisis seems a little short; it would be interesting to see if the lack of significance in the pre period is just due to low sample size there, or reflects a structural change that happened during the crisis.

The results also highlight my point from the other thread about timing--it seems that the coefficients for the crisis and post-crisis periods are very similar, and you would likely get similar results if you just included a dummy for 2008-2015 instead of separating into crisis and post-crisis periods. It would be worthwhile to look at how the relationship changes in each year, as part of testing for pre-trends (code for this in the other thread).

Comment

Stephan Yan

Join Date: Jul 2019

Posts: 17
#20

24 Jul 2019, 06:07

Dear Kye,

Thank you so much for your help and explanations.

so note that the two models tell you the exact same thing! (All the effects are the same). But model 3) is much preferred to 3a), because it is easier to figure out the effects of interest when the reference category is the pre period; for model 3a), you have to do more difficult math to find the effects (for example, to find the crisis period effect, subtracting the 0.PostCrisis#c.WOB from 1.Crisis#c.WOB)

I see now the differences between the two!

So it's safe to say that:

Model 1) there is a significant and positive link between the female on board and the firm value (natural logarithm of Tobin's Q). Which means firms with higher shares of females directors performed better over the entire period 2005-2015.

Model 2) there is a significant and positive link for the 3rd and 4th quartile relative to the 1st quartile (bottom), which means firms with the highest percentage of female on board of directors (4th quartile) performed relatively better than the firms with the bottom quartile (1st). However the 2nd quartile is positive but insignificant relative to the 1st quartile, so we can not say much more about it. Or at least, talk about a critical mass to have an impact on the firm performance.

Model 3) We see the impact of the firms during the 3 periods, for the precrisis there is a negative and insignificant link between the firms with higher shares of females directors and firm performance. So we can not say that in the PreCrisis period that firms with more women on their board performed better. During the Crisis, there is a positive link and significant (at 10% level) relationship between the firms with higher shares of females directors and firm performance --> In the Crisis periods, firms with more women on their board performed better. And finally we can see that in PostCrisis period, there is a positive and significant link too --> the firms with the highest percentage of females directors perfomed better during the PostCrisis period.

I think this is a really interesting result, but I would recommend that you try to gather more data for years before 2005. Only 3 years before the crisis seems a little short; it would be interesting to see if the lack of significance in the pre period is just due to low sample size there, or reflects a structural change that happened during the crisis.

Unfortunately i can not gather more datas before 2005, because of the lack of datas concerning the % of women on board (I took the datas from Datastream (asset4 US universe).

Thanks!

Stephan
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#21

24 Jul 2019, 15:22

I generally agree with your conclusions, but a few clarifications:

All your models include firm fixed effects. So the identifying variation comes from firms that *changed* their share of women on their board relative to the average (by firm and industry-year). Thus, for model 1 you can even make a stronger statement; it is not just that "firms with higher shares of female directors performed better", but that "firms that *increased* their share of female directors performed better".

For model 2, I wouldn't read too much into the results as saying that there is a "critical mass" effect. Note that you can't statistically distinguish the effects for the 2nd and 3rd quartile (as their confidence intervals overlap), or the 3rd and 4th quartile. So while you can say for sure that 3rd/4th quartile firms do better than 1st quartile firms, you can't say for sure how the 2nd quartile compares to the others (or that 3rd is lower than 4th). A better way to test for nonlinearity might be to include WoB and its squared value in the regression, and use -margins- to plot how the predicted relationship varies with the share of women.

Gathering new primary data is always a challenge, but could be an interesting area for future research. There does seem to be a time series back to 1995 for female % of boards for Fortune 500 companies (https://www.pewsocialtrends.org/fact...women-leaders/), meaning someone does have this data.
Comment
Ayub UOM

Join Date: Feb 2018

Posts: 83
#22

26 Aug 2019, 04:22

Hello Stephan Yan and Kye Lippold
I read your threads in detail and get inspired. I am also working on a similar topic and i want to follow your methodology and instructions and i want to use hdfe or DID. i am using stata 13, and i have successfult installed hdfe

package name: hdfe.pkg
from: http://fmwww.bc.edu/RePEc/bocode/h/

checking hdfe consistency and verifying not already installed...
all files already exist and are up to date.

but I am unable to run this command,i don't know why i am facing this difficulty?

reghdfe tobinq WOB_dummy c.WOB_dummy#i.Crisis c.WOB_dummy#i.postCrisis `controls' , absorb( Industry_year code)

because i got this message

unrecognized command: reghdfe
r(199);

second Question i also want to know that in quantile dummy,
we will run regression on original quantile variable WOB_quartile 1,2,3,4 or i will generate separate dummy for Q4 quartile and Q1 quartile? but if yes for these two, then how can i deal with Q2 and Q3
WOB_Q1=1 if WOB in Q4 and 0 , if WOB in Q1
WOB_Q1=1 if WOB in Q1 and 0, if WOB in Q4
but what about women in Q2 and Q3?
because when i am generating these two dummy variables my Q2 and Q3 get missing, and my total number of observations are reduced.

Thank you in advance for your cooperation.
Ayub

Last edited by Ayub UOM; 26 Aug 2019, 04:42.
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#23

26 Aug 2019, 16:29

-hdfe- is not the same as -reghdfe-. Type

Code:

ssc install reghdfe

to get the correct program installed (you can find more instructions here if needed: http://scorreia.com/software/reghdfe/install.html)

The correct way to generate the dummies would be like this:
WOB_Q1=1 if WOB in Q1 and 0 otherwise (so 0 if WOB in Q2, Q3, or Q4)

In other words, you want all observations to be coded, without missing values for those in Q2 and Q3. You should include all the dummies in the regression, except one (the baseline category, as is standard for dummy variables).

If you have more questions about this, I would recommend starting a new thread.
Comment
Ayub UOM

Join Date: Feb 2018

Posts: 83
#24

27 Aug 2019, 17:34

Kye Lippold
thank you sir for your time,and i will start a new thread for my questions very soon.
best regards.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment