Sector difference in a regression

Joris Berg

Join Date: Jul 2024

Posts: 9
#1

Sector difference in a regression

03 Jul 2024, 13:07

Hi,
I'm doing research on determinants for a buy-out of a company and the sectoral differences between determinants. I have data on 138 buyouts across 7 different sectors. The number of observations per sector varies from 5 to 30. Furthermore, I have a lot of observations of non-buyouts.

I want to test the effect of ROA, size and D/A ratio on the odds of a buyout across different sectors using a regression. Furthermore I have divided ROA in quartiles so that the highest and lowest quartile are both an independent variable. I have come across two methods to compare across sectors.

1) Estimate a separate regression for each industry and then compare the coefficients.

2)Estimate one regression with interaction variables.

I was wondering if anyone knows which is the best method to use in my situation with the limited number of buyout observations?

Kind regards,
Joris
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

03 Jul 2024, 13:53

The two approaches are, for practical purposes, equivalent. Properly done, they will give the exact same coefficients. The standard errors are computed a bit differently in the two approaches, but seldom differ by enough to be of concern. So choosing between them boils down to matters of convention, taste, and convenience. Here are some things to consider.
There may be a convention in your field that one of these and used and the other is not. Check the literature to see what others in your field do. If both approaches can be found in journals, then this factor does not apply to you.

A formal comparison of the results of two separate regressions is done in Stata using the -suest- command. There are some regression commands, in particular the -xt- regressions, that are not compatible with the -suest- command. So if you are planning to do panel-regressions with the -xt- commands, your only option is to use the interaction method. (Well, -xtreg- can be emulated with -regress ... i.panel_var-, which is acceptable to -suest-. But with -xtlogit- or other such models, -suest- is not available to you.)

-suest- directly outputs the coefficients for each level of your moderating variable. The interaction approach does not: you have to post-calculate them from the regression output. This is not hard to do with the -margins- command, but it is an extra step.

Whichever you do, make sure you do them correctly in detail. For example, if you need robust or clustered variance estimation , with the -suest- approach you use ordinary variance estimation in the regressions, and then have -suest- apply the robust or clustered versions. If you do the interaction approach, you must interact the moderating variable with all of the regressors, not just the one(s) of main interest, in order to get the same results that separate regressions would give you.

Added: As far as I know, the limited number of buyouts in your data, though it poses difficulties for answering your research question, it bites equally hard with either of these approaches.
2 likes
Comment
Joris Berg

Join Date: Jul 2024

Posts: 9
#3

04 Jul 2024, 07:31

Thank you for helping me. If I have done the separate regressions and used the suest command afterwards. Can I compare all the coefficients or do I need to do tests to compare them?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17700

04 Jul 2024, 09:34

Joris:
after -suest- you can jointly test all the coefficients + constant or a single coefficient:

Code:

. use "C:\Program Files\Stata18\ado\base\a\auto.dta"
(1978 automobile data)


. regress price mpg if foreign==0

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(1, 50)        =     17.05
       Model |   124392956         1   124392956   Prob > F        =    0.0001
    Residual |   364801844        50  7296036.89   R-squared       =    0.2543
-------------+----------------------------------   Adj R-squared   =    0.2394
       Total |   489194801        51  9592054.92   Root MSE        =    2701.1

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
       _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
------------------------------------------------------------------------------

. estimates store A

. regress price mpg if foreign==1

      Source |       SS           df       MS      Number of obs   =        22
-------------+----------------------------------   F(1, 20)        =     13.25
       Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
    Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
-------------+----------------------------------   Adj R-squared   =    0.3685
       Total |   144363213        21   6874438.7   Root MSE        =    2083.6

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
       _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
------------------------------------------------------------------------------

. estimates store B

. suest A B

Simultaneous results for A, B                               Number of obs = 74

------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
A_mean       |
         mpg |  -329.2551   80.16093    -4.11   0.000    -486.3676   -172.1425
       _cons |   12600.54   1755.108     7.18   0.000     9160.589    16040.49
-------------+----------------------------------------------------------------
A_lnvar      |
       _cons |   15.80284   .2986031    52.92   0.000     15.21759    16.38809
-------------+----------------------------------------------------------------
B_mean       |
         mpg |  -250.3668   84.69387    -2.96   0.003    -416.3637   -84.36987
       _cons |   12586.95   2258.417     5.57   0.000     8160.534    17013.37
-------------+----------------------------------------------------------------
B_lnvar      |
       _cons |   15.28371   .2310235    66.16   0.000     14.83091    15.73651
------------------------------------------------------------------------------

. test [A_mean  = B_mean ], cons

 ( 1)  [A_mean]mpg - [B_mean]mpg = 0
 ( 2)  [A_mean]_cons - [B_mean]_cons = 0

           chi2(  2) =   10.17
         Prob > chi2 =    0.0062

. test [A_mean]mpg = [B_mean]mpg

 ( 1)  [A_mean]mpg - [B_mean]mpg = 0

           chi2(  1) =    0.46
         Prob > chi2 =    0.4987

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Joris Berg

Join Date: Jul 2024

Posts: 9
#5

04 Jul 2024, 09:54

Hii, Thank you for your reply! Is this also possible to do if you have 5 different regressions? So that you can check if all coefficients are statistically the same or that they are different. Or do you need to check for all the possible regressions separately?
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30063

04 Jul 2024, 10:24

Yes, you can. Here's an example with four subsets. (The auto.dta's rep78 variable has only two observations with rep78 = 1, so regressing in that subset yields no standard errors and comparisons are not possible. So we just look at the subsets for rep78 = 2, 3, 4, 5.)

Code:

. sysuse auto, clear
(1978 automobile data)

. drop if missing(rep78)
(5 observations deleted)

.
. forvalues i = 2/5 {
  2.         regress price mpg if rep78 == `i'
  3.         estimates store r`i'
  4. }

      Source |       SS           df       MS      Number of obs   =         8
-------------+----------------------------------   F(1, 6)         =      4.69
       Model |    39346087         1    39346087   Prob > F        =    0.0735
    Residual |  50336476.9         6  8389412.82   R-squared       =    0.4387
-------------+----------------------------------   Adj R-squared   =    0.3452
       Total |  89682563.9         7  12811794.8   Root MSE        =    2896.4

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |   -630.823    291.288    -2.17   0.074    -1343.579    81.93295
       _cons |   18032.12   5664.222     3.18   0.019     4172.264    31891.97
------------------------------------------------------------------------------

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(1, 28)        =     14.65
       Model |   123788982         1   123788982   Prob > F        =    0.0007
    Residual |   236582733        28  8449383.33   R-squared       =    0.3435
-------------+----------------------------------   Adj R-squared   =    0.3201
       Total |   360371715        29  12426610.9   Root MSE        =    2906.8

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -498.8875   130.3389    -3.83   0.001    -765.8747   -231.9003
       _cons |   16124.28    2587.92     6.23   0.000     10823.17    21425.39
------------------------------------------------------------------------------

      Source |       SS           df       MS      Number of obs   =        18
-------------+----------------------------------   F(1, 16)        =      0.90
       Model |  2643203.86         1  2643203.86   Prob > F        =    0.3572
    Residual |  47043724.6        16  2940232.79   R-squared       =    0.0532
-------------+----------------------------------   Adj R-squared   =   -0.0060
       Total |  49686928.5        17   2922760.5   Root MSE        =    1714.7

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -79.90338   84.27343    -0.95   0.357    -258.5551    98.74832
       _cons |    7802.74   1870.119     4.17   0.001     3838.264    11767.22
------------------------------------------------------------------------------

      Source |       SS           df       MS      Number of obs   =        11
-------------+----------------------------------   F(1, 9)         =      7.88
       Model |  31950588.4         1  31950588.4   Prob > F        =    0.0204
    Residual |  36471559.6         9  4052395.52   R-squared       =    0.4670
-------------+----------------------------------   Adj R-squared   =    0.4077
       Total |    68422148        10   6842214.8   Root MSE        =    2013.1

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -204.6947   72.89925    -2.81   0.020    -369.6042   -39.78513
       _cons |   11514.19   2085.085     5.52   0.000       6797.4    16230.98
------------------------------------------------------------------------------

.
. suest r2 r3 r4 r5

Simultaneous results for r2, r3, r4, r5                     Number of obs = 67

------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
r2_mean      |
         mpg |   -630.823   299.0619    -2.11   0.035    -1216.974   -44.67249
       _cons |   18032.12   6436.568     2.80   0.005     5416.673    30647.56
-------------+----------------------------------------------------------------
r2_lnvar     |
       _cons |   15.94248    .375964    42.40   0.000     15.20561    16.67936
-------------+----------------------------------------------------------------
r3_mean      |
         mpg |  -498.8875   107.9205    -4.62   0.000    -710.4079   -287.3672
       _cons |   16124.28   2197.721     7.34   0.000     11816.83    20431.73
-------------+----------------------------------------------------------------
r3_lnvar     |
       _cons |    15.9496   .4024218    39.63   0.000     15.16087    16.73834
-------------+----------------------------------------------------------------
r4_mean      |
         mpg |  -79.90338   63.39549    -1.26   0.208    -204.1563     44.3495
       _cons |    7802.74   1313.624     5.94   0.000     5228.084     10377.4
-------------+----------------------------------------------------------------
r4_lnvar     |
       _cons |     14.894   .2903858    51.29   0.000     14.32485    15.46315
-------------+----------------------------------------------------------------
r5_mean      |
         mpg |  -204.6947   80.62313    -2.54   0.011    -362.7131   -46.67624
       _cons |   11514.19    2526.28     4.56   0.000     6562.773    16465.61
-------------+----------------------------------------------------------------
r5_lnvar     |
       _cons |   15.21482   .3253477    46.76   0.000     14.57715    15.85249
------------------------------------------------------------------------------

.
. test [r2_mean = r3_mean], notest

 ( 1)  [r2_mean]mpg - [r3_mean]mpg = 0

. test [r3_mean = r4_mean], accum notest

 ( 1)  [r2_mean]mpg - [r3_mean]mpg = 0
 ( 2)  [r3_mean]mpg - [r4_mean]mpg = 0

. test [r4_mean = r5_mean], accum

 ( 1)  [r2_mean]mpg - [r3_mean]mpg = 0
 ( 2)  [r3_mean]mpg - [r4_mean]mpg = 0
 ( 3)  [r4_mean]mpg - [r5_mean]mpg = 0

           chi2(  3) =   13.33
         Prob > chi2 =    0.0040

It works equally well when there are also multiple regressors.

Let me add something to Carlo's excellent advice, where he has shown you how to code what you asked for. What is your actual research question? If your research question asks whether every variable in your equation has the same marginal effect in both (or all 5) subsets, then this mass testing is the appropriate way to answer it. But if your regression involves other variables that are included just to adjust for their extraneous variance or reduce confounding (aka omitted variable bias), then there is no reason to test for equality of the coefficients of these "control variables." And there is good reason not to. It may be that there are no statistically significant differences in the coefficients of the focal variables in your research question but there are some in the covariates. That may lead to the omnibus test of every coefficient being equal across subsets giving a statistically significant result. That's a true answer, but it answers the wrong question and it is misleading with respect to the question of whether the coefficients of the focal variables differ across models.

I will not say much about the issue of adjusting your p-values for multiple hypothesis testing other than to remind you that some people would say you must do that. This in turn affects the issue raised in the preceding paragraph, as the more hypotheses jointly tested, the more extreme the p-value deflation is. So another reason not to do tests that are unnecessary and do not actually answer the research question.

Comment

Joris Berg

Join Date: Jul 2024

Posts: 9
#7

04 Jul 2024, 11:53

Thank you for this elaborate example and helping me with my research. My research question is: Do the determinants for a buyout of a company differ per sector. In my regression, Buyout is the dependent variable and ROA_q1, ROA_q4 and FinDep are the main variables of interest. Firmsize and log(Assets) are added as control variables. My main goal is finding out if the method of value creation for a buyout differs per sector. The 3 variables of interest reflect 3 different methods so I want to see if a method is more used more in a sector compared to others.

By reading your advice, which helps me a lot, I think I only have to test the equality of the coefficients for the variables of interest. Which is not clear to me is how the significance of a coefficient effects my conclusions because some of the coefficients are significant while others are not. I was wondering if you could conclude that there are differences between sectors if one coefficient is significant while it is not in another sector, as this shows that one method is significantly present in a sector and not in another sector.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#8

04 Jul 2024, 12:07

I think I only have to test the equality of the coefficients for the variables of interest.

That's right.

Which is not clear to me is how the significance of a coefficient effects my conclusions because some of the coefficients are significant while others are not. I was wondering if you could conclude that there are differences between sectors if one coefficient is significant while it is not in another sector, as this shows that one method is significantly present in a sector and not in another sector.

For those who believe in and use the concept of statistical significance, it is very important to remember that the difference between a statistically significant result and a statistically not significant result is, itself, not statistically significant. It is entirely possible for result 1 to be statistically significant, result 2 not statistically significant, yet the difference between them may nevertheless not be statistically significant. Similarly, results 1 and 2 may both be statistically significant, but their difference may not be. Or results 1 and 2 may both be not statistically significant, yet their difference may be. To draw conclusions about whether result 1 differs from reuslt 2 to a statistically significant extent you must actually do a significance test on the difference between them. Nothing else will do.

If this is counterintuitive it's because the concept of statistical significance is itself counterintuitive (and, some would argue, deeply flawed). The most understandable situation is where result 1 is statistically significant and result 2 is not. Put significance aside and just look at the coefficients themselves. They could conceivably be the same or very nearly so. This is because the sample sizes may be different for each of them, or the amount of outcome variation may differ in the corresponding samples. Even when those are the same, it may be that result 1's p-value is just barely below 0.05 and result 2's just barely above. In studies like this where you are focusing on differences of regression coefficients across subsets, I think it is very important to present, as context, the actual coefficients themselves, along with their confidence intervals, in each group. You can report statistical significance tests if you wish or must, but remember that for your research question, the only significance test that matters is the test of the difference between result 1 and result 2. And in presenting that, I would use -lincom- rather than -test- because -lincom- will show you the actual difference between them along with a confidence interval for that, as well as a p-value for the difference. (-test- will give you only the p-value).
1 like
Comment
Joris Berg

Join Date: Jul 2024

Posts: 9
#9

24 Jul 2024, 13:19

Hii, Thanks for all the help. I used the test command in my thesis to test if the coefficients of my regression where the same. I thought that this was called the Wald-test but I'm not sure. Does anyone know how to call these tests in my thesis?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#10

24 Jul 2024, 13:25

Yes, the commands that -test- calculates are Wald tests.
Comment

Announcement