Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare regression coefficients between 2 groups

    Hi,

    I am very confused about interpretation of the wald test in STATA.

    Let's say that I have data on
    height, weight and sex (female dummy). I would like to know the effect of height on weight by sex. When I run a regression height and weight for female I get a a positive statistically significant coefficient. Oppositely when I run the same regression as before for male I obtain a negative NOT significant coefficient. When I rerun the regression with the interaction term: Weight=a+b1height+b2Female+b3Female*Male and I "ttest" the interaction variable. With a p=0.898 I conclude that the regression coefficients between height and weight do NOT significantly differ across sex groups.

    Here is my confusion: What does it mean "significantly differ across sex groups"? and different from zero(assuming is two sided)?When I run "weight=a+b1height" by sex I obtain two different coefficient. I know that they are not zero and they are not equal. What is the Wald test adding to my analysis?

    Plus, now I know that t
    he regression coefficients between height and weight do NOT significantly differ across sex groups. So what do I do? Delete male? (clearly not logical).

    Thank you for your help


    Maggio

  • #2
    Hello Magio,

    Welcome to the Stata Forum.

    To start, if you want to compare males versus females, you should just include the dummy, say, "females", if you want to compare females versus males, or "males", vice-versa.

    I fear you did, say, basically, a subgroup analysis instead of the main analysis. What is more, I didn't understand how you performed the interaction between males and females. Overall, I fear it is not need and may be misleading. A binary variables, like "gender", can do the trick for you. That said, you may add an interaction term between "sex" and weight, for example. Also, you may check a quadratic term for "weight".

    In the forthcoming messages, please present the commands in Stata as well as the output, as recommended in the FAQ.

    To end, I kindly suggest you to prefer to write "Stata", with just one capital letter. Thank you.

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      Marco:
      I do share Marcos'previous comments.
      First off, I'm not clear with why your running two separate regression for male and female; this way, you cannot answer to one of your research question (is there any gender-related difference in the -depvar-, other things being equal?). You cam work thi around by including -i.sex- among your predictors.
      However, including -i.sex- without interaction gives you different intercepts for male and female only: in other words you impose the same slope coefficient for male and female), as you can see from the following exampe:
      Code:
      . sysuse auto.dta
      (1978 Automobile Data)
      
      . reg price mpg i.foreign
      
            Source |       SS           df       MS      Number of obs   =        74
      -------------+----------------------------------   F(2, 71)        =     14.07
             Model |   180261702         2  90130850.8   Prob > F        =    0.0000
          Residual |   454803695        71  6405685.84   R-squared       =    0.2838
      -------------+----------------------------------   Adj R-squared   =    0.2637
             Total |   635065396        73  8699525.97   Root MSE        =    2530.9
      
      ------------------------------------------------------------------------------
             price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
                   |
           foreign |
          Foreign  |   1767.292    700.158     2.52   0.014     371.2169    3163.368
             _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
      ------------------------------------------------------------------------------
      
      
      . bysort foreign: list price mpg xb_noi if [_n]==1
      
      --------------------------------------------------------------------------------------------------------------------------------------
      -> foreign = Domestic
      
           +------------------------+
           | price   mpg     xb_noi |
           |------------------------|
        1. | 4,099    22   5433.114 |
           +------------------------+
      
      --------------------------------------------------------------------------------------------------------------------------------------
      -> foreign = Foreign
      
           +------------------------+
           | price   mpg     xb_noi |
           |------------------------|
        1. | 9,690    17   8671.384 |
           +------------------------+
      
      
      . di 11905.42 + (22*-294.1955)
      5433.119
      
      . di (11905.42+1767.292) + (17*-294.1955)
      8671.3885
      In order to investigate whether the slope coefficients differes across geneder, as per Marcos' remark, an intereaction between height and gender is welcomed, as you can see from the following example, that elaborates on the previous one:
      Code:
      . reg price c.mpg##i.foreign
      
            Source |       SS           df       MS      Number of obs   =        74
      -------------+----------------------------------   F(3, 70)        =      9.48
             Model |   183435281         3  61145093.6   Prob > F        =    0.0000
          Residual |   451630115        70  6451858.79   R-squared       =    0.2888
      -------------+----------------------------------   Adj R-squared   =    0.2584
             Total |   635065396        73  8699525.97   Root MSE        =    2540.1
      
      -------------------------------------------------------------------------------
              price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      --------------+----------------------------------------------------------------
                mpg |  -329.2551   74.98545    -4.39   0.000    -478.8088   -179.7013
                    |
            foreign |
           Foreign  |  -13.58741   2634.664    -0.01   0.996    -5268.258    5241.084
                    |
      foreign#c.mpg |
           Foreign  |   78.88826   112.4812     0.70   0.485    -145.4485     303.225
                    |
              _cons |   12600.54   1527.888     8.25   0.000     9553.261    15647.81
      -------------------------------------------------------------------------------
      
      . predict xb, xb
      
      
      . bysort foreign: list price mpg xb if [_n]==1
      
      --------------------------------------------------------------------------------------------------------------------------------------
      -> foreign = Domestic
      
           +------------------------+
           | price   mpg         xb |
           |------------------------|
        1. | 4,099    22   5356.926 |
           +------------------------+
      
      --------------------------------------------------------------------------------------------------------------------------------------
      -> foreign = Foreign
      
           +------------------------+
           | price   mpg         xb |
           |------------------------|
        1. | 9,690    17   8330.715 |
           +------------------------+
      
      
      . di 12600.54 + (22*-329.2551)
      5356.9278
      
      . di (12600.54-13.58741) + (17*(-329.2551+78.88826))
      8330.7163
      You can then test your coefficients via -test- and/or -parmtest-.

      In summary, you get different results from your regressions because you're running different models: no white (or black) magic is lurking behind those outcomes.
      Last edited by Carlo Lazzaro; 16 May 2016, 00:42.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Dear Dr. Lazzaro and Dr. Almeida,

        Thank you for responding.

        I have no problem in dealing with the dummy. I know that I only need a dummy for gender in this case (1=Femele, 0=male) and that by including the interaction term (gender*height) I will look at the difference between female and male.

        My code is:

        sort gender
        by gender: reg weight height

        * I run this because I was curious to see the difference in the groups separably. I obtain that is male is stat significant and female is not

        xi: reg weight height i.gender*height

        Here I obtain that the interaction term is not statistically significant. and so is the dummy.


        My issue is with the "ttest of the interaction term. If I fail to reject the null it means that the coefficient between height and weight do NOT significantly differ across gender groups. I don't understand this.
        What does this mean in "practical terms"? That if I delete all the females the results won't change? That the coefficient are not equal? I know that are not because I run the two regression separately.

        Thank you so much

        Excited to be part of the Stata comunity

        Marco

        Comment


        • #5
          Marco:
          - in your first model, you do not adjust -height- for gender;
          - in your second model, if you have Stata 14 (for older version you should tell the list) -xi- is pleonastic, as -fvvarlist- will do all the nitty-gritty for you (categorical variable and interaction, too).
          Here, the dummy refers to the intercept, whereas the interaction affects the slope: i brief, neither the intercepts coefficients, nor the slope ones show evidenxe of a statistical significant difference for males vs females.
          Repeating to myself theat "the absence of evidence is not evidence of absence" (for more details on this topic my favourite reference is http://www.ncbi.nlm.nih.gov/pubmed/0007647644), your coefficients may well difers significantly had you collected more data.
          What written above can be made hopefully clearer with the help of a toy-example:
          Code:
          . sysuse auto.dta
          (1978 Automobile Data).
          
          ///Model 1 - Two separate regressions; -mpg- is not adjusted for -foreign-///
          
          by foreign, sort: reg price mpg
          
          ----------------------------------------------------------------------------------------------------------------------------------------------
          -> foreign = Domestic
          
                Source |       SS           df       MS      Number of obs   =        52
          -------------+----------------------------------   F(1, 50)        =     17.05
                 Model |   124392956         1   124392956   Prob > F        =    0.0001
              Residual |   364801844        50  7296036.89   R-squared       =    0.2543
          -------------+----------------------------------   Adj R-squared   =    0.2394
                 Total |   489194801        51  9592054.92   Root MSE        =    2701.1
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
                 _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
          ------------------------------------------------------------------------------
          
          ----------------------------------------------------------------------------------------------------------------------------------------------
          -> foreign = Foreign
          
                Source |       SS           df       MS      Number of obs   =        22
          -------------+----------------------------------   F(1, 20)        =     13.25
                 Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
              Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
          -------------+----------------------------------   Adj R-squared   =    0.3685
                 Total |   144363213        21   6874438.7   Root MSE        =    2083.6
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
                 _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
          ------------------------------------------------------------------------------
          ///Model 2 - One regression; two intercepts but the same slope///
          
          . reg price mpg i.foreign
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(2, 71)        =     14.07
                 Model |   180261702         2  90130850.8   Prob > F        =    0.0000
              Residual |   454803695        71  6405685.84   R-squared       =    0.2838
          -------------+----------------------------------   Adj R-squared   =    0.2637
                 Total |   635065396        73  8699525.97   Root MSE        =    2530.9
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
                       |
               foreign |
              Foreign  |   1767.292    700.158     2.52   0.014     371.2169    3163.368
                 _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
          ------------------------------------------------------------------------------
          
          . mat list e(b)
          
          e(b)[1,4]
                                  0b.          1.           
                     mpg     foreign     foreign       _cons
          y1  -294.19553           0   1767.2922   11905.415
          
          . test _cons=_cons+1.foreign
          
           ( 1)  - 1.foreign = 0
          
                 F(  1,    71) =    6.37
                      Prob > F =    0.0138/// the two intercepts do differ
          
          ///Model 3 - One regression; two intercepts and two slopes///
          
          . reg price c.mpg##i.foreign
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(3, 70)        =      9.48
                 Model |   183435281         3  61145093.6   Prob > F        =    0.0000
              Residual |   451630115        70  6451858.79   R-squared       =    0.2888
          -------------+----------------------------------   Adj R-squared   =    0.2584
                 Total |   635065396        73  8699525.97   Root MSE        =    2540.1
          
          -------------------------------------------------------------------------------
                  price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          --------------+----------------------------------------------------------------
                    mpg |  -329.2551   74.98545    -4.39   0.000    -478.8088   -179.7013
                        |
                foreign |
               Foreign  |  -13.58741   2634.664    -0.01   0.996    -5268.258    5241.084
                        |
          foreign#c.mpg |
               Foreign  |   78.88826   112.4812     0.70   0.485    -145.4485     303.225
                        |
                  _cons |   12600.54   1527.888     8.25   0.000     9553.261    15647.81
          -------------------------------------------------------------------------------
          
          . mat list e(b)
          
          e(b)[1,6]
                                    0b.           1.  0b.foreign#   1.foreign#            
                      mpg      foreign      foreign       co.mpg        c.mpg        _cons
          y1   -329.25507            0   -13.587408            0    78.888255    12600.538
          
          . test _cons=_cons+1.foreign
          
           ( 1)  - 1.foreign = 0
          
                 F(  1,    70) =    0.00
                      Prob > F =    0.9959
          
          . test mpg=mpg+1.foreign#c.mpg
          
           ( 1)  - 1.foreign#c.mpg = 0
          
                 F(  1,    70) =    0.49
                      Prob > F =    0.4854/// neither the intercepts, nor the slopes differ
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Thank you so much Dr. Lazzaro.

            This was useful. However my main question is: what does "neither the intercepts, nor the slopes show evidence of statistical significance difference for males vs female" mean? That's the only thing I am confused about. What does that mean in "practical terms"?

            Thank you again for your time.

            Best,
            Marco

            Comment


            • #7
              Marco (please, call me Carlo):
              it may mean two different things:
              - the first one (the most probable): your sample is to small to detect any statistical differences for both intercept and slopes, intercepts only, slopes only between males and females; this outcome should be taken as a matter of fact; it may well be that a difference exists, but you simply cannot detect it with your data.
              In this case, the usual recommendation sounds like: go and collect more data, come back to your desk, widen your database and re-run it all over again.
              Unfortunately, this approach is unfeasible for different reasons (i.e., a rare disease with an incidence of 5 new patients per year will never allow you to reach statistical significant results related to the comparison of two drugs aimed at improving patients' health state), organization inefficiencies, red tape and, last but not least, budget constraints;
              - there's really no difference across gender for the parameters you're interested in (i.e., you cannot find any difference in the populatinon from which the sample was drawn): this seldom happens (in my experience, at least).
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Thank you Carlo.

                Comment


                • #9
                  Dear all,

                  Thank you very much indeed for these posts.
                  I have a question: How to correctly and formally report these results (e.g. the absence or the presence of differences between two groups for more variables) in a scientific paper?
                  Best,

                  Sergio

                  Comment


                  • #10
                    Sergio:
                    if you're going to use regression for your research, you can report the outcome table and comment on it on the Results section of the paper (or, at least, this is what I usually do)..
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Dear Carlo,
                      Is it possible to check differences between genders if the variable I would like to interact with gender is endogenous and instrumented?

                      Comment


                      • #12
                        Javier:
                        unfortunately, I have nothing to add to https://www.statalist.org/forums/for...ented-variable
                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment

                        Working...
                        X