Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compare regression coefficients across different subsamples

    Dear Statalisters,

    I am new to this forum and I looking for help because after hours of searching the internet I am confused about a question that I will now expose.

    I am running a regression in order to understand the impact of a specific characteristics of the investor, rapresented by the dummy variable INDIP, on the invested company's performance (DIPV), controlling for other factors. One of these factors is, let's say, another dummy variable CONTR.

    I want to understand whether the impact of INDIP on DIPV is different among the two subgroups, identified by setting CONTR=0 and CONTR=1.

    I therefore run two identical regressions on the two subsamples, in order to compare the coefficients obtained for the INDIP variable (both significant and positive).

    My questions are:
    -1 Is this the proper way of dealing with the problem of comparing the different impact of a dummy variable among two indipendent subgroups?

    -2 What is the proper statistical test to evaluate whether the difference between the two coefficients is significantly different from 0?

    I found that a Z test costructed as follows could be a solution:
    Z=(b1-b2)/(SEb1^2+SEb2^2)^1/2,
    where b1 and b2 are the coefficients, and SEb1 and SEb2 are the respective standard errors of the regression.

    - I run a robust regression, is it correct to use such standard errors?
    - Should it be a t-test, theorically speaking, since the variance of the population is unknown? If yes, how can I run such a test on STATA?

    Thank you for the support and I'm sorry for my little comprehension on this matter.

  • #2
    Hi Francesco. The first example on this UCLA webpage should be helpful. As it shows, the coefficient for the interaction between a dichotomous explanatory variable and a continuous explanatory variable shows the difference between the two slopes.

    HTH.
    --
    Bruce Weaver
    Email: [email protected]
    Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
    Version: Stata/MP 18.0 (Windows)

    Comment


    • #3
      Hello Bruce, thank you for you help,

      I understand now that using an interaction term could be useful. Regarding this approach:
      -Can I use it even if both variables are dummies?
      -What are the pros and cons of using this approach compared to the "regression on subsamples" approach? Is the latter theorically wrong?

      Comment


      • #4
        Francesco:
        1) yes, you can interact categorical variables (see -help fvvarlis- for more details);
        2) -fvvarlist- notation has the enormous benefit of tight relationships with -margins- and -marginsplot- (see related help file for more details).
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Thank you Carlo,

          regarding the first approach, comparing the coefficients for the INDIP variable in the regressions identified for the two subsamples (Contr=0 and Contr=1) makes no sense?

          Comment


          • #6
            Francesco, when both variables are dichotomous, the coefficient for the interaction term is equal to what some folks call the difference in differences. Try the following example to see what I mean.

            Code:
            clear *
            sysuse auto
            tab rep78
            keep if inrange(rep78,3,4) // make rep78 dichotomous
            tab rep78
            
            regress mpg foreign##rep78
            local int = _b[1.foreign#4.rep78]
            regress mpg foreign if rep78==3, noheader
            local f3 = _b[foreign]
            regress mpg foreign if rep78==4, noheader
            local f4 = _b[foreign]
            display as text "Difference in differences = " as result `f4'-`f3'
            display as text "Coefficient for the interaction = " as result `int'
            Here is my output from the two -display- commands at the end.

            Code:
            . display as text "Difference in differences = " as result `f4'-`f3'
            Difference in differences = 2.1111111
            
            . display as text "Coefficient for the interaction = " as result `int'
            Coefficient for the interaction = 2.1111111
            --
            Bruce Weaver
            Email: [email protected]
            Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
            Version: Stata/MP 18.0 (Windows)

            Comment


            • #7
              Bruce,
              to make sure I understood the example: the coefficient for the interaction term in the total sample should be equal to the difference between the coefficients obtained for the considered dummy in the regressions on the two subsamples.
              However, this does not apply to my regressions, maybe because I also include other variables in the analysis.



              Comment


              • #8
                However, this does not apply to my regressions, maybe because I also include other variables in the analysis.
                Do you mean that both models include (for example) A, B and C in addition to the two interacting variables, or that one model includes A, B and C while the other includes X, Y and Z?
                --
                Bruce Weaver
                Email: [email protected]
                Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                Version: Stata/MP 18.0 (Windows)

                Comment


                • #9
                  The models applied on the subsamples are the same, as they include the same "other" control variables, say A, B, and C, in addition to CONTR and INDIP.

                  Let me explain:

                  MODEL1 (reg DIPV INDIP A B C if CONTR=0, vce(robust)) -> INDIP coefficient = 5
                  MODEL2 (reg DIPV INDIP A B C if CONTR=1, vce(robust)) -> INDIP coefficient = 8
                  MODEL3 (reg DIPV INDIP A B C CONTR INDIP##CONTR, vce(robust)) -> INDIP##CONTR = 2 (while it should be, according to the example above, equal to 3, the difference of coefficients in models 1 and 2)
                  Last edited by Francesco Firrincieli; 10 Feb 2018, 01:49.

                  Comment


                  • #10
                    It would help, I think, if you posted a reproducible example. See item 12 in the FAQ for details about using -dataex- and posting your exact Stata commands between CODE delimiters.

                    Meanwhile, you could try this:

                    Code:
                    regress DIPV c.INDIP##c.CONTR A B C, vce(robust)
                    generate byte c0 = e(sample) & CONTR==0
                    generate byte c1 = e(sample) & CONTR==1
                    regress DIPV INDIP A B C if c0, vce(robust)
                    regress DIPV INDIP A B C if c1, vce(robust)
                    --
                    Bruce Weaver
                    Email: [email protected]
                    Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                    Version: Stata/MP 18.0 (Windows)

                    Comment


                    • #11
                      Thank you Bruce,
                      using your code I get exactly the same regression outputs, with a coefficient for interaction which is different from the difference in differences.

                      Anyway, the most important questions now are (since I built my entire study on the regressions on the subsamples, and the deadline is approaching):
                      - is it WRONG to use this method when I have the same variables for both regressions?
                      - can I someway justify this approach, compared to the one with the interaction term?
                      - using this approach, is there a way to compare the coefficients and understand whether a significant difference exists between them?

                      Thank you for your kind support, it is really appreciated!

                      Comment


                      • #12
                        Francesco:
                        despite I do prefer the interaction approach, Stata offers a way (Chow's test) to calculate what (I think) you're after via -suest-, as you can see from the folowing toy-example:
                        Code:
                        . use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta"
                        (1978 Automobile Data)
                        
                        . regress price mpg foreign if foreign==0
                        note: foreign omitted because of collinearity
                        
                              Source |       SS           df       MS      Number of obs   =        52
                        -------------+----------------------------------   F(1, 50)        =     17.05
                               Model |   124392956         1   124392956   Prob > F        =    0.0001
                            Residual |   364801844        50  7296036.89   R-squared       =    0.2543
                        -------------+----------------------------------   Adj R-squared   =    0.2394
                               Total |   489194801        51  9592054.92   Root MSE        =    2701.1
                        
                        ------------------------------------------------------------------------------
                               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                 mpg |  -329.2551   79.74034    -4.13   0.000    -489.4183   -169.0919
                             foreign |          0  (omitted)
                               _cons |   12600.54   1624.773     7.76   0.000     9337.085    15863.99
                        ------------------------------------------------------------------------------
                        
                        . estimates store A
                        
                        . regress price mpg foreign if foreign==1
                        note: foreign omitted because of collinearity
                        
                              Source |       SS           df       MS      Number of obs   =        22
                        -------------+----------------------------------   F(1, 20)        =     13.25
                               Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
                            Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
                        -------------+----------------------------------   Adj R-squared   =    0.3685
                               Total |   144363213        21   6874438.7   Root MSE        =    2083.6
                        
                        ------------------------------------------------------------------------------
                               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                 mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
                             foreign |          0  (omitted)
                               _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
                        ------------------------------------------------------------------------------
                        
                        . estimates store B
                        
                        . suest A B
                        
                        Simultaneous results for A, B
                        
                                                                        Number of obs     =         74
                        
                        ------------------------------------------------------------------------------
                                     |               Robust
                                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        A_mean       |
                                 mpg |  -329.2551   80.16093    -4.11   0.000    -486.3676   -172.1425
                             foreign |          0  (omitted)
                               _cons |   12600.54   1755.108     7.18   0.000     9160.589    16040.49
                        -------------+----------------------------------------------------------------
                        A_lnvar      |
                               _cons |   15.80284   .2986031    52.92   0.000     15.21759    16.38809
                        -------------+----------------------------------------------------------------
                        B_mean       |
                                 mpg |  -250.3668   84.69387    -2.96   0.003    -416.3637   -84.36987
                             foreign |          0  (omitted)
                               _cons |   12586.95   2258.417     5.57   0.000     8160.534    17013.37
                        -------------+----------------------------------------------------------------
                        B_lnvar      |
                               _cons |   15.28371   .2310235    66.16   0.000     14.83091    15.73651
                        ------------------------------------------------------------------------------
                        
                        . test [A_mean = B_mean]
                        
                         ( 1)  [A_mean]mpg - [B_mean]mpg = 0
                         ( 2)  [A_mean]o.foreign - [B_mean]o.foreign = 0
                               Constraint 2 dropped
                        
                                   chi2(  1) =    0.46
                                 Prob > chi2 =    0.4987
                        
                        .
                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment


                        • #13
                          Carlo,
                          thank you for your suggestion.
                          However, as I understand the Chow's Test regards the equality of ALL the coefficients in the two regressions.
                          In my case, I am only interested in analyzing the difference between the 2 coefficients of the INDIP variable, desregarding the A B C variables.

                          Comment


                          • #14
                            Francesco:
                            do you mean something along the following lines?
                            Code:
                            . test [A_mean]mpg - [B_mean]mpg = 0
                            
                             ( 1)  [A_mean]mpg - [B_mean]mpg = 0
                            
                                       chi2(  1) =    0.46
                                     Prob > chi2 =    0.4987
                            Kind regards,
                            Carlo
                            (Stata 18.0 SE)

                            Comment


                            • #15
                              If that test compares only the coefficients of mpg obtained in A and B, and excludes all the other variables (which are not present in this example), then YES. If P-value < 0,1 I can conclude that the coefficients are not equal, right?

                              Comment

                              Working...
                              X