Compare regression coefficients between 2 groups

Maggio Marco

Join Date: May 2016

Posts: 36
#1

Compare regression coefficients between 2 groups

15 May 2016, 16:37

Hi,

I am very confused about interpretation of the wald test in STATA.

Let's say that I have data on height, weight and sex (female dummy). I would like to know the effect of height on weight by sex. When I run a regression height and weight for female I get a a positive statistically significant coefficient. Oppositely when I run the same regression as before for male I obtain a negative NOT significant coefficient. When I rerun the regression with the interaction term: Weight=a+b1height+b2Female+b3Female*Male and I "ttest" the interaction variable. With a p=0.898 I conclude that the regression coefficients between height and weight do NOT significantly differ across sex groups.

Here is my confusion: What does it mean "significantly differ across sex groups"? and different from zero(assuming is two sided)?When I run "weight=a+b1height" by sex I obtain two different coefficient. I know that they are not zero and they are not equal. What is the Wald test adding to my analysis?

Plus, now I know that the regression coefficients between height and weight do NOT significantly differ across sex groups. So what do I do? Delete male? (clearly not logical).

Thank you for your help

Maggio
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

15 May 2016, 16:46

Hello Magio,

Welcome to the Stata Forum.

To start, if you want to compare males versus females, you should just include the dummy, say, "females", if you want to compare females versus males, or "males", vice-versa.

I fear you did, say, basically, a subgroup analysis instead of the main analysis. What is more, I didn't understand how you performed the interaction between males and females. Overall, I fear it is not need and may be misleading. A binary variables, like "gender", can do the trick for you. That said, you may add an interaction term between "sex" and weight, for example. Also, you may check a quadratic term for "weight".

In the forthcoming messages, please present the commands in Stata as well as the output, as recommended in the FAQ.

To end, I kindly suggest you to prefer to write "Stata", with just one capital letter. Thank you.

Best,

Marcos

Best regards,

Marcos
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17714

16 May 2016, 00:14

Marco:
I do share Marcos'previous comments.
First off, I'm not clear with why your running two separate regression for male and female; this way, you cannot answer to one of your research question (is there any gender-related difference in the -depvar-, other things being equal?). You cam work thi around by including -i.sex- among your predictors.
However, including -i.sex- without interaction gives you different intercepts for male and female only: in other words you impose the same slope coefficient for male and female), as you can see from the following exampe:

Code:

. sysuse auto.dta
(1978 Automobile Data)

. reg price mpg i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     14.07
       Model |   180261702         2  90130850.8   Prob > F        =    0.0000
    Residual |   454803695        71  6405685.84   R-squared       =    0.2838
-------------+----------------------------------   Adj R-squared   =    0.2637
       Total |   635065396        73  8699525.97   Root MSE        =    2530.9

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
             |
     foreign |
    Foreign  |   1767.292    700.158     2.52   0.014     371.2169    3163.368
       _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
------------------------------------------------------------------------------


. bysort foreign: list price mpg xb_noi if [_n]==1

--------------------------------------------------------------------------------------------------------------------------------------
-> foreign = Domestic

     +------------------------+
     | price   mpg     xb_noi |
     |------------------------|
  1. | 4,099    22   5433.114 |
     +------------------------+

--------------------------------------------------------------------------------------------------------------------------------------
-> foreign = Foreign

     +------------------------+
     | price   mpg     xb_noi |
     |------------------------|
  1. | 9,690    17   8671.384 |
     +------------------------+


. di 11905.42 + (22*-294.1955)
5433.119

. di (11905.42+1767.292) + (17*-294.1955)
8671.3885

In order to investigate whether the slope coefficients differes across geneder, as per Marcos' remark, an intereaction between height and gender is welcomed, as you can see from the following example, that elaborates on the previous one:

Code:

. reg price c.mpg##i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(3, 70)        =      9.48
       Model |   183435281         3  61145093.6   Prob > F        =    0.0000
    Residual |   451630115        70  6451858.79   R-squared       =    0.2888
-------------+----------------------------------   Adj R-squared   =    0.2584
       Total |   635065396        73  8699525.97   Root MSE        =    2540.1

-------------------------------------------------------------------------------
        price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
          mpg |  -329.2551   74.98545    -4.39   0.000    -478.8088   -179.7013
              |
      foreign |
     Foreign  |  -13.58741   2634.664    -0.01   0.996    -5268.258    5241.084
              |
foreign#c.mpg |
     Foreign  |   78.88826   112.4812     0.70   0.485    -145.4485     303.225
              |
        _cons |   12600.54   1527.888     8.25   0.000     9553.261    15647.81
-------------------------------------------------------------------------------

. predict xb, xb


. bysort foreign: list price mpg xb if [_n]==1

--------------------------------------------------------------------------------------------------------------------------------------
-> foreign = Domestic

     +------------------------+
     | price   mpg         xb |
     |------------------------|
  1. | 4,099    22   5356.926 |
     +------------------------+

--------------------------------------------------------------------------------------------------------------------------------------
-> foreign = Foreign

     +------------------------+
     | price   mpg         xb |
     |------------------------|
  1. | 9,690    17   8330.715 |
     +------------------------+


. di 12600.54 + (22*-329.2551)
5356.9278

. di (12600.54-13.58741) + (17*(-329.2551+78.88826))
8330.7163

You can then test your coefficients via -test- and/or -parmtest-.

In summary, you get different results from your regressions because you're running different models: no white (or black) magic is lurking behind those outcomes.

Last edited by Carlo Lazzaro; 16 May 2016, 00:42.

Kind regards,
Carlo
(Stata 19.0)

Comment

Maggio Marco

Join Date: May 2016

Posts: 36
#4

16 May 2016, 08:01

Dear Dr. Lazzaro and Dr. Almeida,

Thank you for responding.

I have no problem in dealing with the dummy. I know that I only need a dummy for gender in this case (1=Femele, 0=male) and that by including the interaction term (gender*height) I will look at the difference between female and male.

My code is:

sort gender
by gender: reg weight height

* I run this because I was curious to see the difference in the groups separably. I obtain that is male is stat significant and female is not

xi: reg weight height i.gender*height

Here I obtain that the interaction term is not statistically significant. and so is the dummy.

My issue is with the "ttest of the interaction term. If I fail to reject the null it means that the coefficient between height and weight do NOT significantly differ across gender groups. I don't understand this.
What does this mean in "practical terms"? That if I delete all the females the results won't change? That the coefficient are not equal? I know that are not because I run the two regression separately.

Thank you so much

Excited to be part of the Stata comunity

Marco
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17714

16 May 2016, 08:54

Marco:
- in your first model, you do not adjust -height- for gender;
- in your second model, if you have Stata 14 (for older version you should tell the list) -xi- is pleonastic, as -fvvarlist- will do all the nitty-gritty for you (categorical variable and interaction, too).
Here, the dummy refers to the intercept, whereas the interaction affects the slope: i brief, neither the intercepts coefficients, nor the slope ones show evidenxe of a statistical significant difference for males vs females.
Repeating to myself theat "the absence of evidence is not evidence of absence" (for more details on this topic my favourite reference is http://www.ncbi.nlm.nih.gov/pubmed/0007647644), your coefficients may well difers significantly had you collected more data.
What written above can be made hopefully clearer with the help of a toy-example:

Code:

. sysuse auto.dta
(1978 Automobile Data).

///Model 1 - Two separate regressions; -mpg- is not adjusted for -foreign-///

by foreign, sort: reg price mpg

----------------------------------------------------------------------------------------------------------------------------------------------
-> foreign = Domestic

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(1, 50)        =     17.05
       Model |   124392956         1   124392956   Prob > F        =    0.0001
    Residual |   364801844        50  7296036.89   R-squared       =    0.2543
-------------+----------------------------------   Adj R-squared   =    0.2394
       Total |   489194801        51  9592054.92   Root MSE        =    2701.1

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
       _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------------------------------------
-> foreign = Foreign

      Source |       SS           df       MS      Number of obs   =        22
-------------+----------------------------------   F(1, 20)        =     13.25
       Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
    Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
-------------+----------------------------------   Adj R-squared   =    0.3685
       Total |   144363213        21   6874438.7   Root MSE        =    2083.6

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
       _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
------------------------------------------------------------------------------
///Model 2 - One regression; two intercepts but the same slope///

. reg price mpg i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     14.07
       Model |   180261702         2  90130850.8   Prob > F        =    0.0000
    Residual |   454803695        71  6405685.84   R-squared       =    0.2838
-------------+----------------------------------   Adj R-squared   =    0.2637
       Total |   635065396        73  8699525.97   Root MSE        =    2530.9

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
             |
     foreign |
    Foreign  |   1767.292    700.158     2.52   0.014     371.2169    3163.368
       _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
------------------------------------------------------------------------------

. mat list e(b)

e(b)[1,4]
                        0b.          1.           
           mpg     foreign     foreign       _cons
y1  -294.19553           0   1767.2922   11905.415

. test _cons=_cons+1.foreign

 ( 1)  - 1.foreign = 0

       F(  1,    71) =    6.37
            Prob > F =    0.0138/// the two intercepts do differ

///Model 3 - One regression; two intercepts and two slopes///

. reg price c.mpg##i.foreign

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(3, 70)        =      9.48
       Model |   183435281         3  61145093.6   Prob > F        =    0.0000
    Residual |   451630115        70  6451858.79   R-squared       =    0.2888
-------------+----------------------------------   Adj R-squared   =    0.2584
       Total |   635065396        73  8699525.97   Root MSE        =    2540.1

-------------------------------------------------------------------------------
        price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
          mpg |  -329.2551   74.98545    -4.39   0.000    -478.8088   -179.7013
              |
      foreign |
     Foreign  |  -13.58741   2634.664    -0.01   0.996    -5268.258    5241.084
              |
foreign#c.mpg |
     Foreign  |   78.88826   112.4812     0.70   0.485    -145.4485     303.225
              |
        _cons |   12600.54   1527.888     8.25   0.000     9553.261    15647.81
-------------------------------------------------------------------------------

. mat list e(b)

e(b)[1,6]
                          0b.           1.  0b.foreign#   1.foreign#            
            mpg      foreign      foreign       co.mpg        c.mpg        _cons
y1   -329.25507            0   -13.587408            0    78.888255    12600.538

. test _cons=_cons+1.foreign

 ( 1)  - 1.foreign = 0

       F(  1,    70) =    0.00
            Prob > F =    0.9959

. test mpg=mpg+1.foreign#c.mpg

 ( 1)  - 1.foreign#c.mpg = 0

       F(  1,    70) =    0.49
            Prob > F =    0.4854/// neither the intercepts, nor the slopes differ

Kind regards,
Carlo
(Stata 19.0)

Comment

Maggio Marco

Join Date: May 2016

Posts: 36
#6

16 May 2016, 09:50

Thank you so much Dr. Lazzaro.

This was useful. However my main question is: what does "neither the intercepts, nor the slopes show evidence of statistical significance difference for males vs female" mean? That's the only thing I am confused about. What does that mean in "practical terms"?

Thank you again for your time.

Best,
Marco
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#7

16 May 2016, 10:08

Marco (please, call me Carlo):
it may mean two different things:
- the first one (the most probable): your sample is to small to detect any statistical differences for both intercept and slopes, intercepts only, slopes only between males and females; this outcome should be taken as a matter of fact; it may well be that a difference exists, but you simply cannot detect it with your data.
In this case, the usual recommendation sounds like: go and collect more data, come back to your desk, widen your database and re-run it all over again.
Unfortunately, this approach is unfeasible for different reasons (i.e., a rare disease with an incidence of 5 new patients per year will never allow you to reach statistical significant results related to the comparison of two drugs aimed at improving patients' health state), organization inefficiencies, red tape and, last but not least, budget constraints;
- there's really no difference across gender for the parameters you're interested in (i.e., you cannot find any difference in the populatinon from which the sample was drawn): this seldom happens (in my experience, at least).

Kind regards,
Carlo
(Stata 19.0)
Comment
Maggio Marco

Join Date: May 2016

Posts: 36
#8

17 May 2016, 08:51

Thank you Carlo.
Comment
Sergio Rivaroli

Join Date: Sep 2016

Posts: 13
#9

15 Nov 2018, 10:44

Dear all,

Thank you very much indeed for these posts.
I have a question: How to correctly and formally report these results (e.g. the absence or the presence of differences between two groups for more variables) in a scientific paper?
Best,

Sergio
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#10

15 Nov 2018, 10:49

Sergio:
if you're going to use regression for your research, you can report the outcome table and comment on it on the Results section of the paper (or, at least, this is what I usually do)..

Kind regards,
Carlo
(Stata 19.0)
Comment
Javier Gutierrez

Join Date: May 2018

Posts: 14
#11

21 May 2019, 09:56

Dear Carlo,
Is it possible to check differences between genders if the variable I would like to interact with gender is endogenous and instrumented?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#12

21 May 2019, 11:00

Javier:
unfortunately, I have nothing to add to https://www.statalist.org/forums/for...ented-variable

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement