equality of coefficients test with reghdfe: subsample analysis or interaction terms?

Qing Ye

Join Date: Jan 2019

Posts: 15
#1

equality of coefficients test with reghdfe: subsample analysis or interaction terms?

22 Jan 2019, 18:17

I am trying to test the impact that margin trading status has on stock liquidity. I regress liquidity measure on a dummy variable (mt_csmar) which indicates whether a stock is margin eligiable or not, and some other control variables. I control firm fixed effect and time effect by regressing using reghdfe. This is fine.

But I would also like to see whether the coefficient of this mt_csmar dummy differs in bull or bear markets, compared with normal periods. I could run the same specifications for subsamples, such as during bull periods, bear periods or normal periods. However, it seems that, with reghdfe, I cannot test whether the coefficients on mt_csmar are statistically different in different subsamples. Hence I decided to use interaction terms, interacting bull and bear dummies with the mt_csmar.

I report the results for subsample analysis, and results for specifications using interaction terms below. From the subsample analysis, you can see that coefficients of mt_csmar is bear markets are positive, theose in bull and normal periods are both negative, but it is largeer in absoule terms during nromal periods. However, if I use interaction terms to do the analysis, the results are very different. coefficients on mt_csmar are more nagative during bull and bear markets than during normal periods.

My questions are :
1) which sets of results should I trust? Why the interaction terms analysis produce different conclusions than the subsample analysis?
2) if I should trust subsample results, then, how do I test the equality of coefficients after running the reghdfe?
3) bull and bear dummies are basically correlated with time effect, they were dropped in the reghdfe regressions when there are time fixed effects dummies, does this represent a problem? should i give up controlling time fixed effect and use xtreg instead?

Thank you very much for any comments you have, I have been go back and forth on this points for a while.

The following is bull market subsample:

The following is bear market subsample

The following is normal periods subsample

The following is specifications with interaction terms
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

22 Jan 2019, 18:50

Please read the Forum FAQ for advice about how to show Stata output in ways that are helpful. Your screenshots here are barely readable, as is often the case, which is why people are specifically asked not to use them.

That said, your interaction model is done wrong, so you should not attempt to interpret its results or compare them to anything else. You should not be using separate variables for bull and bear markets. You should have a single variable that takes on three values, for example, 0 = normal times, 1 = bear market, 2 = bull market. Let's call this variable condition. Then your regression command should be

Code:

reghdfe outcome_variable i.condition##c.mt_csmar other_variables, absorb(stkcode date_n) cluster(stkcode date_n)

And then after that you should do:

Code:

margins condition, dydx(mt_csmar)

to get the marginal effects of mt_csmar under each of the three conditions.

As for contrasting those marginal effects, you can do that by testing the interaction coefficients from the -reghfde- equation:

Code:

test 1.condition#mt_csmar 2.condition#mt_csmar

for an omnibus test of the null hypothesis that the effect of mt_csmar is the same in all three conditions. If you want to specifically contrast, say bull vs bear, then that would be

Code:

test 1.condition#mt_csmar = 2.condition#mt_csmar
Comment
Qing Ye

Join Date: Jan 2019

Posts: 15
#3

22 Jan 2019, 20:16

Thank you Clyde for your suggestions. For posting the Stata output, I did copy the table as pictures and save as png files. PNG files seem to be encouraged. I then use image icon to send these pictures to the server. I guess I have tried to put everything in one big picture which makes it harder to read. I will try to break the code lines and results seperatly this time.

I have tried what you have suggested and create another variable conditions which are 0, 1, 2 for normal, bear and bull markets, and run reghdfe with the interaction of conditions and my dummy variable mt_csmar.

Firstly, conditions are still automatically dropped due to collinearity with the time fixed effect (date_n variable). Secondly, when I try to get the marignal effect of mt_csmar, nothing can be estimated. I have posted the results below.

In addition, I do not understand why cannot create two sepreate dummy variables and observe the coeffiicents of the interactions terms between these two dummies and mt_csmar to judge whether mt_csmar's impact on the outcome variables is stronger or weaker than the normal periods. Could you please provide more explanations? Thank you.

Code:

reghdfe ln_efsnew i.conditions##c.mt_csmar control-variables , absorb(stkcd date_n) cluster(stkcd date_n)

Result:

Then I do: margins conditions, dydx(mt_csmar)
Here is the result：

I am not sure what went wrong.
Comment
Qing Ye

Join Date: Jan 2019

Posts: 15
#4

22 Jan 2019, 23:35

I noticed that, using conditions produce the same results (coefficients) with producing two dummies (bull and bear). So I guess, both methods are fine. This then goes to my previous questions, why subsample analysis produces different conclusions with analysis using interaction terms. Anyone has any ideas?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

23 Jan 2019, 09:30

The best way to show Stata output is to copy it from your Results window or log file and past it directly into the Forum editor, surrounded by code delimiters. If you are not familiar with code delimiters, read Forum FAQ #12. If you do that, there will be no readability issues. Yes, .png files are better than other images, but are still not as good--your results are, again, barely readable, though the margins output is easily read. But with copy/paste and code delimiters there will never be a problem.

In this particular model, there is no problem from using separate bull and bear indicators instead of a three-level one. But in more complicated models, it could make a difference. My focus on this is to have you code the model in a way that supports the use of -margins-. With only the complexity inherent in your model, -margins- will get it right with bull and bear separate. But in more complicated models, -margins- will get it wrong. Rather than having to think about every model you run to figure out whether -margins- will work properly with it, it is better to just get into the habit of doing it the right way every time, even if that isn't necessary in some particular case.

Your original model, however, would still be incorrect as originally coded because you used c.bull and c.bear, where as you must use i.bull and i.bear. -margins- uses a different way of calculating the marginal effects for continuous and discrete variables, so it is important to get this right.

I'm sorry about the non-estimability. I forgot to add that you need the -noestimcheck- option in this command. In fixed effects models, many of the parameters one might try to estimate with -margins- are in fact no identifiable, and -margins- tells you so with the (not estimable) result. But -margins- overdoes this. The marginal effects are, in fact, identifiable. The -noestimcheck- option allows you to override -margins-' difficulties here. (But do not do this with abandon: here you really need to know which parameters are identifiable and which are not. Anything that is a function of the fixed effect themselves is non-identifiable.)

The omission of 1.conditions and 2.conditions is due to their colinearity with the stkcd fixed effects and is expected; it is not a problem. In fact, if this did not happen, it would be an indication that there is something wrong in the data!

As for the difference between what you are getting from the interaction approach and the separate samples, this arises from the other variables in the model. When you do the interaction model, you are constraining the coefficients of all the other variables to be independent of bull/bear/normal. When you do the separate samples, you get separate estimate of those other variables' coefficients in each sample. Since those other variables are, themselves, correlated with outcome or with bull/bear/normal, this results in changes in the bull/bear/normal effects. I haven't carefully reviewed all of the coefficients in the three separate samples output you show (and, in fact, I can't read some of them at all), but even just a casual review shows that there are some very substantial differences in these other coefficients across those three models.

This suggests that the implicit constraint of equality of coefficients imposed by the interaction model is not suitable for this data. You can relax this constraint by adding to your interaction model more terms that provide for interaction between bull/bear/normal and the other variables:

Code:

reghdfe ln_efsnew i.conditions##(c.mt_csmar control-variables) , absorb(stkcd date_n) cluster(stkcd date_n)

will do that and will give you results that are the same as you got from your separate samples (with perhaps some small differences due to numerical issues). And then you can do

Code:

margins conditions, dydx(mt_csmar) noestimcheck // MARGINAL EFFECXT OF mt_csmar test 1.conditions#mt_csmar 2.conditions#mt_csmar // OMNIBUS TEST OF INTERACTION test 1.conditions#mt_csmar = 2.conditions#mt_csmar // TEST OF bull = bear
Comment
Qing Ye

Join Date: Jan 2019

Posts: 15
#6

23 Jan 2019, 23:29

Originally posted by Clyde Schechter View Post

The best way to show Stata output is to copy it from your Results window or log file and past it directly into the Forum editor, surrounded by code delimiters. If you are not familiar with code delimiters, read Forum FAQ #12. If you do that, there will be no readability issues. Yes, .png files are better than other images, but are still not as good--your results are, again, barely readable, though the margins output is easily read. But with copy/paste and code delimiters there will never be a problem.

In this particular model, there is no problem from using separate bull and bear indicators instead of a three-level one. But in more complicated models, it could make a difference. My focus on this is to have you code the model in a way that supports the use of -margins-. With only the complexity inherent in your model, -margins- will get it right with bull and bear separate. But in more complicated models, -margins- will get it wrong. Rather than having to think about every model you run to figure out whether -margins- will work properly with it, it is better to just get into the habit of doing it the right way every time, even if that isn't necessary in some particular case.

Your original model, however, would still be incorrect as originally coded because you used c.bull and c.bear, where as you must use i.bull and i.bear. -margins- uses a different way of calculating the marginal effects for continuous and discrete variables, so it is important to get this right.

I'm sorry about the non-estimability. I forgot to add that you need the -noestimcheck- option in this command. In fixed effects models, many of the parameters one might try to estimate with -margins- are in fact no identifiable, and -margins- tells you so with the (not estimable) result. But -margins- overdoes this. The marginal effects are, in fact, identifiable. The -noestimcheck- option allows you to override -margins-' difficulties here. (But do not do this with abandon: here you really need to know which parameters are identifiable and which are not. Anything that is a function of the fixed effect themselves is non-identifiable.)

The omission of 1.conditions and 2.conditions is due to their colinearity with the stkcd fixed effects and is expected; it is not a problem. In fact, if this did not happen, it would be an indication that there is something wrong in the data!

As for the difference between what you are getting from the interaction approach and the separate samples, this arises from the other variables in the model. When you do the interaction model, you are constraining the coefficients of all the other variables to be independent of bull/bear/normal. When you do the separate samples, you get separate estimate of those other variables' coefficients in each sample. Since those other variables are, themselves, correlated with outcome or with bull/bear/normal, this results in changes in the bull/bear/normal effects. I haven't carefully reviewed all of the coefficients in the three separate samples output you show (and, in fact, I can't read some of them at all), but even just a casual review shows that there are some very substantial differences in these other coefficients across those three models.

This suggests that the implicit constraint of equality of coefficients imposed by the interaction model is not suitable for this data. You can relax this constraint by adding to your interaction model more terms that provide for interaction between bull/bear/normal and the other variables:

Code:

reghdfe ln_efsnew i.conditions##(c.mt_csmar control-variables) , absorb(stkcd date_n) cluster(stkcd date_n)

will do that and will give you results that are the same as you got from your separate samples (with perhaps some small differences due to numerical issues). And then you can do

Code:

margins conditions, dydx(mt_csmar) noestimcheck // MARGINAL EFFECXT OF mt_csmar test 1.conditions#mt_csmar 2.conditions#mt_csmar // OMNIBUS TEST OF INTERACTION test 1.conditions#mt_csmar = 2.conditions#mt_csmar // TEST OF bull = bear

This is very helpful. I have tried the above codes, and indeed, the interaction models now produce similar conclusions to those using separate samples. I am really grateful for the help!
Comment

Qing Ye

Join Date: Jan 2019
Posts: 15

25 Jan 2019, 06:57

Hi, Statalist

I came across the similar problem again when I am trying a different specification. This time, even allowing the other control variables to be different in different groups of stocks still produce inconsistent results in the interaction models compared with in sub samples.
The following is my code and results for interaction model:

Code:

. reghdfe ln_efsnew i.IO_3group##(c.finance_turnover c.short_turnover c.lag_ln_efsnew c.ln_firmsize c.ln_volatility_20 c.ln_volume c.ln_price c.
> return c.ln_ownerratio_csmar i.HS300 i.PL ), absorb(stkcd date_n) cluster(stkcd date_n)
(MWFE estimator converged in 8 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
note: 2.IO_3group#1.HS300 omitted because of collinearity

HDFE Linear regression                            Number of obs   =    734,090
Absorbing 2 HDFE groups                           F(  34,    960) =     262.23
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.7072
                                                  Adj R-squared   =     0.7062
Number of clusters (stkcd)   =        961         Within R-sq.    =     0.2879
Number of clusters (date_n)  =      1,634         Root MSE        =     0.2543

                                            (Std. Err. adjusted for 961 clusters in stkcd date_n)
-------------------------------------------------------------------------------------------------
                                |               Robust
                      ln_efsnew |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
                      IO_3group |
                             1  |   3.905128   .9999033     3.91   0.000      1.94288    5.867376
                             2  |   3.743145   .9979922     3.75   0.000     1.784647    5.701643
                                |
               finance_turnover |   .2115961   .0978413     2.16   0.031     .0195886    .4036035
                 short_turnover |   1.132953    1.19845     0.95   0.345    -1.218931    3.484836
                  lag_ln_efsnew |   .1486854    .038023     3.91   0.000     .0740676    .2233032
                    ln_firmsize |   .0479427   .0561804     0.85   0.394    -.0623079    .1581934
               ln_volatility_20 |   .0127298   .0169635     0.75   0.453      -.02056    .0460197
                      ln_volume |   .0055563   .0155432     0.36   0.721    -.0249463     .036059
                       ln_price |  -.5521989   .0558957    -9.88   0.000    -.6618907    -.442507
                         return |   .0499057   .1776742     0.28   0.779    -.2987688    .3985803
            ln_ownerratio_csmar |   .0008377     .01502     0.06   0.956    -.0286381    .0303135
                        1.HS300 |  -.0527273   .0082782    -6.37   0.000    -.0689729   -.0364818
                           1.PL |    .173827   .0490673     3.54   0.000     .0775355    .2701185
                                |
   IO_3group#c.finance_turnover |
                             1  |   -.166646    .098774    -1.69   0.092    -.3604839    .0271918
                             2  |  -.5111654    .101006    -5.06   0.000    -.7093835   -.3129473
                                |
     IO_3group#c.short_turnover |
                             1  |  -.9870743   1.208885    -0.82   0.414    -3.359437    1.385288
                             2  |  -.7433824   1.213513    -0.61   0.540    -3.124827    1.638062
                                |
      IO_3group#c.lag_ln_efsnew |
                             1  |   .1423081   .0390265     3.65   0.000      .065721    .2188952
                             2  |   .1851955   .0388168     4.77   0.000       .10902     .261371
                                |
        IO_3group#c.ln_firmsize |
                             1  |   -.135036   .0563768    -2.40   0.017     -.245672   -.0244001
                             2  |  -.1079761   .0563595    -1.92   0.056    -.2185782    .0026261
                                |
   IO_3group#c.ln_volatility_20 |
                             1  |   .0168273   .0173054     0.97   0.331    -.0171336    .0507881
                             2  |   .0564783   .0175161     3.22   0.001     .0221041    .0908525
                                |
          IO_3group#c.ln_volume |
                             1  |  -.0157246   .0157157    -1.00   0.317    -.0465656    .0151165
                             2  |   -.027011   .0158036    -1.71   0.088    -.0580246    .0040026
                                |
           IO_3group#c.ln_price |
                             1  |   .2099068   .0550157     3.82   0.000     .1019418    .3178718
                             2  |   .3246516   .0552185     5.88   0.000     .2162887    .4330145
                                |
             IO_3group#c.return |
                             1  |   .2230658   .1808798     1.23   0.218    -.1318995    .5780312
                             2  |  -.1364572   .1809457    -0.75   0.451     -.491552    .2186375
                                |
IO_3group#c.ln_ownerratio_csmar |
                             1  |    .045964   .0148954     3.09   0.002     .0167328    .0751953
                             2  |   .0365486   .0160123     2.28   0.023     .0051255    .0679718
                                |
                IO_3group#HS300 |
                           0 1  |          0  (empty)
                           1 1  |   .0295765   .0095904     3.08   0.002     .0107559    .0483972
                           2 1  |          0   1.54e-13     0.00   1.000    -3.02e-13    3.02e-13
                                |
                   IO_3group#PL |
                           1 1  |  -.0485219   .0498779    -0.97   0.331    -.1464041    .0493604
                           2 1  |  -.0535655   .0505084    -1.06   0.289    -.1526852    .0455541
                                |
                          _cons |  -5.377585   .9948678    -5.41   0.000    -7.329952   -3.425218
-------------------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       stkcd |       961         961           0    *|
      date_n |      1634        1634           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. margins IO_3group , dydx(short_turnover) noestimcheck

Average marginal effects                        Number of obs     =    734,090
Model VCE    : Robust

Expression   : Linear prediction, predict()
dy/dx w.r.t. : short_turnover

--------------------------------------------------------------------------------
               |            Delta-method
               |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
short_turnover |
     IO_3group |
            0  |   1.132953    1.19845     0.95   0.344    -1.215966    3.481871
            1  |   .1458785   .1863425     0.78   0.434    -.2193461    .5111031
            2  |   .3895703   .2076505     1.88   0.061    -.0174171    .7965578
--------------------------------------------------------------------------------

And this is my result for subsample:

Code:

 reghdfe ln_efsnew finance_turnover short_turnover lag_ln_efsnew ln_firmsize  ln_volatility_20 ln_volume   ln_price return   ln_ownerratio_csma
> r  HS300  PL if  IO_3group==0 , absorb(stkcd date_n) cluster(stkcd date_n)
(dropped 26 singleton observations)
(MWFE estimator converged in 10 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
note: HS300 omitted because of collinearity

HDFE Linear regression                            Number of obs   =      3,260
Absorbing 2 HDFE groups                           F(  10,     21) =      25.70
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.6965
                                                  Adj R-squared   =     0.6424
Number of clusters (stkcd)   =         22         Within R-sq.    =     0.1148
Number of clusters (date_n)  =        462         Root MSE        =     0.2125

                                 (Std. Err. adjusted for 22 clusters in stkcd date_n)
-------------------------------------------------------------------------------------
                    |               Robust
          ln_efsnew |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
   finance_turnover |   .2395584   .1211596     1.98   0.061    -.0124067    .4915236
     short_turnover |   2.010175   1.259615     1.60   0.125    -.6093366    4.629687
      lag_ln_efsnew |   .1043233   .0293104     3.56   0.002     .0433691    .1652775
        ln_firmsize |  -.5677539   .2445591    -2.32   0.030    -1.076342   -.0591655
   ln_volatility_20 |  -.0249109   .0208915    -1.19   0.246    -.0683572    .0185354
          ln_volume |   .0326493   .0176223     1.85   0.078    -.0039982    .0692968
           ln_price |    .118167   .3625202     0.33   0.748     -.635735    .8720691
             return |  -.2815352   .2154586    -1.31   0.205    -.7296059    .1665356
ln_ownerratio_csmar |   .3814521    .142721     2.67   0.014     .0846476    .6782566
              HS300 |          0   1.91e-18     0.00   1.000    -3.97e-18    3.97e-18
                 PL |   .1828604   .0504898     3.62   0.002     .0778612    .2878597
              _cons |   6.364814   4.886876     1.30   0.207       -3.798    16.52763
-------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       stkcd |        22          22           0    *|
      date_n |       462         462           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. reghdfe ln_efsnew finance_turnover short_turnover lag_ln_efsnew ln_firmsize  ln_volatility_20 ln_volume   ln_price return   ln_ownerratio_csma
> r  HS300  PL if  IO_3group==1 , absorb(stkcd date_n) cluster(stkcd date_n)
(dropped 1 singleton observations)
(MWFE estimator converged in 9 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.

HDFE Linear regression                            Number of obs   =    366,006
Absorbing 2 HDFE groups                           F(  11,    704) =     475.43
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.7655
                                                  Adj R-squared   =     0.7639
Number of clusters (stkcd)   =        705         Within R-sq.    =     0.3350
Number of clusters (date_n)  =      1,634         Root MSE        =     0.2230

                                (Std. Err. adjusted for 705 clusters in stkcd date_n)
-------------------------------------------------------------------------------------
                    |               Robust
          ln_efsnew |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
   finance_turnover |   .0273482   .0213712     1.28   0.201    -.0146108    .0693071
     short_turnover |   .4839065   .1740851     2.78   0.006     .1421183    .8256947
      lag_ln_efsnew |   .2818307   .0091497    30.80   0.000     .2638667    .2997947
        ln_firmsize |  -.0780929   .0144979    -5.39   0.000    -.1065572   -.0496286
   ln_volatility_20 |    .019398   .0045142     4.30   0.000     .0105351    .0282609
          ln_volume |   .0020262   .0033809     0.60   0.549    -.0046116     .008664
           ln_price |  -.3919089   .0161147   -24.32   0.000    -.4235475   -.3602703
             return |   .2091176   .0523144     4.00   0.000     .1064066    .3118286
ln_ownerratio_csmar |   .0372926   .0060865     6.13   0.000     .0253426    .0492425
              HS300 |  -.0148609   .0081905    -1.81   0.070    -.0309417    .0012198
                 PL |   .1278662   .0109048    11.73   0.000     .1064565     .149276
              _cons |   -1.90087   .2927042    -6.49   0.000    -2.475548   -1.326192
-------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       stkcd |       705         705           0    *|
      date_n |      1634        1634           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

. reghdfe ln_efsnew finance_turnover short_turnover lag_ln_efsnew ln_firmsize  ln_volatility_20 ln_volume   ln_price return   ln_ownerratio_csma
> r  HS300  PL if  IO_3group==2 , absorb(stkcd date_n) cluster(stkcd date_n)
(MWFE estimator converged in 9 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.

HDFE Linear regression                            Number of obs   =    364,797
Absorbing 2 HDFE groups                           F(  11,    738) =     277.00
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.6281
                                                  Adj R-squared   =     0.6257
Number of clusters (stkcd)   =        739         Within R-sq.    =     0.1863
Number of clusters (date_n)  =      1,634         Root MSE        =     0.2789

                                (Std. Err. adjusted for 739 clusters in stkcd date_n)
-------------------------------------------------------------------------------------
                    |               Robust
          ln_efsnew |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
   finance_turnover |  -.2956001   .0262775   -11.25   0.000    -.3471876   -.2440126
     short_turnover |    .142516   .2380393     0.60   0.550    -.3247988    .6098308
      lag_ln_efsnew |   .3078628   .0084778    36.31   0.000     .2912193    .3245063
        ln_firmsize |  -.0813076   .0141996    -5.73   0.000    -.1091841   -.0534311
   ln_volatility_20 |   .0705702   .0056198    12.56   0.000     .0595375    .0816028
          ln_volume |  -.0239334   .0033377    -7.17   0.000    -.0304859    -.017381
           ln_price |  -.1848707   .0187478    -9.86   0.000    -.2216761   -.1480652
             return |  -.0717832   .0565686    -1.27   0.205    -.1828378    .0392714
ln_ownerratio_csmar |   .0474514   .0086297     5.50   0.000     .0305097    .0643931
              HS300 |  -.0457838   .0098787    -4.63   0.000    -.0651775   -.0263901
                 PL |   .1160117   .0117391     9.88   0.000     .0929657    .1390576
              _cons |  -1.373251    .302937    -4.53   0.000    -1.967972   -.7785304
-------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
       stkcd |       739         739           0    *|
      date_n |      1634        1634           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

You can see that, for example, the impact of short_turnover was not significant in the subsample IO_group=2, but it is signficantly positive in the interaction models. Also, the impact of short_turnover is signficantly positive in subsample IO_group==1, but it is not in the interaction model.

I was suspecting the result might be different if I put an VCE (unconditional) option after margins, as this allows the clustered standard errors.

However, the stata told me that VCE(unconditional) cannot be computed.

Code:

. margins IO_3group , dydx(short_turnover) noestimcheck vce(unconditional)
cannot compute vce(unconditional);
predict after could not compute scores

So, I am stuck again. Not sure, why this time, allowing the coefficients of the control variables to vary across different groups does not work. In theory, it should produce very similar result with those in subsamples. Does anyone has any ideas? Thanks

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2470
#8

25 Jan 2019, 07:13

Dear Qing
I think the problem here is the false assumption that estimating your fixed effect models using subsamples should provide the same results as estimating the same model with interactions.
The premise is actually correct. If you interact every single explanatory variable, the results should be the same as estimating separate models. However, based on the results you provide, you are not interacting the fixed effects. This implies you are comparing two different set of models.
The change in the marginal effects magnitudes or the statistical significance across both strategies could be explained exactly because of this differences in the approaches.
HTH
Fernando
Comment
Qing Ye

Join Date: Jan 2019

Posts: 15
#9

25 Jan 2019, 07:20

Originally posted by FernandoRios View Post

Dear Qing
I think the problem here is the false assumption that estimating your fixed effect models using subsamples should provide the same results as estimating the same model with interactions.
The premise is actually correct. If you interact every single explanatory variable, the results should be the same as estimating separate models. However, based on the results you provide, you are not interacting the fixed effects. This implies you are comparing two different set of models.
The change in the marginal effects magnitudes or the statistical significance across both strategies could be explained exactly because of this differences in the approaches.
HTH
Fernando

Thanks Fernando for your comments, it makes a lot of sense. Do you have idea on why the VCE(unconditional) cannot be calculated, I was still kind of hoping it may solve the inconsistency, at least reduce the difference. The interaction approch is my preferred one as it allows me to compare the coefficients of short_turnover in different groups. I don't know whether there is any other way to achieve this after the reghdfe estimation.
Comment
Qing Ye

Join Date: Jan 2019

Posts: 15
#10

25 Jan 2019, 07:24

Actually, is there a way to interact all explanatory variable including even the fixed effect? Or does it make sense to do it?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#11

25 Jan 2019, 07:37

My best guess why that isnt working is because reghdfe has not been programmed to do so yet. Perhaps the author has more details on that on his github page, where he keeps the most update version of reghdfe.
In any case, the difference will remain the same, only standard errors will change.
Now, im not sure about the validity of the following exercise, in particular because of the clusters, but this may show you the results are comparable. Create additional variables, say stkcd2 and date_n2 that are the combination of the original stkcd date_n and your group variables. (perhaps something like egen stkcd2=group(stkcd IO3_group). And use this new variables to estimate the fixed effect models.
One more point. It may not be too important, but i think your baseline group is too small. you have 3,260 obs for about 500 explanatory variables. I would be careful making inferences from that first subsample model.
Fernando
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#12

25 Jan 2019, 12:30

The discrepancy between the interaction results and subset-specific results you are seeing in #7 is because the estimation samples are not the same. If you look at your outputs carefully, you will see that in some of the sample-specific outputs there is a note that certain observations are being dropped because they are singletons. By contrast, no observations are dropped in the interaction model. (I guess those observations become singletons when they are restricted to the subsamples.)

I do not know the inner workings of -reghdfe- and only use it occasionally myself. But I imagine that the dropping of singleton observations is occasioned by your use of clustered VCE in these models. I suspect if you use the ordinary VCE, then -reghdfe- will not omit any observations, and the interaction and sample-specific results will be consistent with each other.
Comment
Qing Ye

Join Date: Jan 2019

Posts: 15
#13

28 Jan 2019, 03:12

Originally posted by FernandoRios View Post

My best guess why that isnt working is because reghdfe has not been programmed to do so yet. Perhaps the author has more details on that on his github page, where he keeps the most update version of reghdfe.
In any case, the difference will remain the same, only standard errors will change.
Now, im not sure about the validity of the following exercise, in particular because of the clusters, but this may show you the results are comparable. Create additional variables, say stkcd2 and date_n2 that are the combination of the original stkcd date_n and your group variables. (perhaps something like egen stkcd2=group(stkcd IO3_group). And use this new variables to estimate the fixed effect models.
One more point. It may not be too important, but i think your baseline group is too small. you have 3,260 obs for about 500 explanatory variables. I would be careful making inferences from that first subsample model.
Fernando

Hi, Fernando
I tried your suggestions by creating stkcd2 and date_n2, and used them as fixed effect in the interaction model, the results are now closer to those in the subsamples. Thanks.
Comment
Qing Ye

Join Date: Jan 2019

Posts: 15
#14

28 Jan 2019, 03:30

Originally posted by Clyde Schechter View Post

The discrepancy between the interaction results and subset-specific results you are seeing in #7 is because the estimation samples are not the same. If you look at your outputs carefully, you will see that in some of the sample-specific outputs there is a note that certain observations are being dropped because they are singletons. By contrast, no observations are dropped in the interaction model. (I guess those observations become singletons when they are restricted to the subsamples.)

I do not know the inner workings of -reghdfe- and only use it occasionally myself. But I imagine that the dropping of singleton observations is occasioned by your use of clustered VCE in these models. I suspect if you use the ordinary VCE, then -reghdfe- will not omit any observations, and the interaction and sample-specific results will be consistent with each other.

Thanks Clyde for the comments, using ordinary vce indeed reduces the number of singletons to be dropped. However, it won't change the conclusions, results are very similar to those using cluster options.
Comment
Jacob Helium

Join Date: May 2020

Posts: 10
#15

26 May 2020, 18:12

Originally posted by Clyde Schechter View Post

The best way to show Stata output is to copy it from your Results window or log file and past it directly into the Forum editor, surrounded by code delimiters. If you are not familiar with code delimiters, read Forum FAQ #12. If you do that, there will be no readability issues. Yes, .png files are better than other images, but are still not as good--your results are, again, barely readable, though the margins output is easily read. But with copy/paste and code delimiters there will never be a problem.

In this particular model, there is no problem from using separate bull and bear indicators instead of a three-level one. But in more complicated models, it could make a difference. My focus on this is to have you code the model in a way that supports the use of -margins-. With only the complexity inherent in your model, -margins- will get it right with bull and bear separate. But in more complicated models, -margins- will get it wrong. Rather than having to think about every model you run to figure out whether -margins- will work properly with it, it is better to just get into the habit of doing it the right way every time, even if that isn't necessary in some particular case.

Your original model, however, would still be incorrect as originally coded because you used c.bull and c.bear, where as you must use i.bull and i.bear. -margins- uses a different way of calculating the marginal effects for continuous and discrete variables, so it is important to get this right.

I'm sorry about the non-estimability. I forgot to add that you need the -noestimcheck- option in this command. In fixed effects models, many of the parameters one might try to estimate with -margins- are in fact no identifiable, and -margins- tells you so with the (not estimable) result. But -margins- overdoes this. The marginal effects are, in fact, identifiable. The -noestimcheck- option allows you to override -margins-' difficulties here. (But do not do this with abandon: here you really need to know which parameters are identifiable and which are not. Anything that is a function of the fixed effect themselves is non-identifiable.)

The omission of 1.conditions and 2.conditions is due to their colinearity with the stkcd fixed effects and is expected; it is not a problem. In fact, if this did not happen, it would be an indication that there is something wrong in the data!

As for the difference between what you are getting from the interaction approach and the separate samples, this arises from the other variables in the model. When you do the interaction model, you are constraining the coefficients of all the other variables to be independent of bull/bear/normal. When you do the separate samples, you get separate estimate of those other variables' coefficients in each sample. Since those other variables are, themselves, correlated with outcome or with bull/bear/normal, this results in changes in the bull/bear/normal effects. I haven't carefully reviewed all of the coefficients in the three separate samples output you show (and, in fact, I can't read some of them at all), but even just a casual review shows that there are some very substantial differences in these other coefficients across those three models.

This suggests that the implicit constraint of equality of coefficients imposed by the interaction model is not suitable for this data. You can relax this constraint by adding to your interaction model more terms that provide for interaction between bull/bear/normal and the other variables:

Code:

reghdfe ln_efsnew i.conditions##(c.mt_csmar control-variables) , absorb(stkcd date_n) cluster(stkcd date_n)

will do that and will give you results that are the same as you got from your separate samples (with perhaps some small differences due to numerical issues). And then you can do

Code:

margins conditions, dydx(mt_csmar) noestimcheck // MARGINAL EFFECXT OF mt_csmar test 1.conditions#mt_csmar 2.conditions#mt_csmar // OMNIBUS TEST OF INTERACTION test 1.conditions#mt_csmar = 2.conditions#mt_csmar // TEST OF bull = bear

Hi Clyde, do you know how to deal with fixed effects variables in the absorb parenthesis in this case? it seems that in my case, because the fixed effect variables are not interacted with the conditions variable, this stacked regression still provides different estimates as I got from my separate samples.

Thanks,
Jacob
Comment

Announcement