Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regressing by subsample

    Dear Statalist,

    I am currently working on an analysis of the effects of ESG ratings on stock returns. I am currently wanting to analyze if those effects differ for different industries.

    I have two different versions of code that I am trying, one being an interaction effect and the other being an if option.
    Code:
    qui global h4en i.Energy##c.ESG_Q TAT_Q TAG_Q TA_Q EBIT_Q ROA_Q FL_Q BTM_Q
    reghdfe $ylist $h4en, absorb(n_ID Quarter) vce(cluster n_ID)
    test 1.Energy#c.ESG_Q = ESG_Q
    If I do this, I see that my output is significant, implying that the energy sector sees a different coefficient of ESG_Q than my general sample does.

    However, when I perform this code:
    Code:
    reghdfe $ylist $h2 if Energy == 1, absorb(n_ID Quarter) vce(cluster n_ID)
    I get a different answer. $h2 is the same as $h4en, except that $h2 does not have the interaction effect nor an industry variable in it.

    Does anyone know what the reason for this is? My personal guess is that it is due to the fact that they do not operate in the same sample, but I am not sure of this. I am thinking this due to the fact that I limit my sample with if Energy == 1

  • #2
    Maarten:
    interaction is the way to go here.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      You're using different samples, with a different number of observations. Also, standard errors depend negatively on the number of observations, so ceteris paribus, if the latter declines, standard errors will rise and your regressors will no longer be significant.

      Comment


      • #4
        And I completely agree with Carlo by the way, always maximise your degrees of freedom

        Comment


        • #5
          Thank you both Carlo and Maxence.

          I have another question. When I have that interaction effect, to truly see if the ESG_Q is different for that industry, is a simple
          Code:
          test 1.Energy#c.ESG_Q = ESG_Q
          enough?

          I personally thought that, if I want to compare the two ESG_Q coefficients, I should make something along the lines of:

          Code:
          test sum(1.Energy#c.ESG_Q + ESG_Q) = ESG_Q
          I know that a test sum(..) isn't possible, but should it be the way I am working towards?


          Kind regards,
          Maarten Loomans.
          Last edited by Maarten Loomans; 22 Jun 2022, 07:12.

          Comment


          • #6
            just look at the significance of the interaction term, that is what researchers tend to do in practice

            Comment


            • #7
              Thank you Maxence.

              Comment


              • #8
                Maarten:
                you may want to consider something along the following lines:
                Code:
                . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
                (1978 automobile data)
                
                . regress price i.foreign##i.rep78
                note: 1.foreign#1b.rep78 identifies no observations in the sample.
                note: 1.foreign#2.rep78 identifies no observations in the sample.
                note: 1.foreign#5.rep78 omitted because of collinearity.
                
                      Source |       SS           df       MS      Number of obs   =        69
                -------------+----------------------------------   F(7, 61)        =      0.39
                       Model |    24684607         7  3526372.43   Prob > F        =    0.9049
                    Residual |   552112352        61  9051022.16   R-squared       =    0.0428
                -------------+----------------------------------   Adj R-squared   =   -0.0670
                       Total |   576796959        68  8482308.22   Root MSE        =    3008.5
                
                -------------------------------------------------------------------------------
                        price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                --------------+----------------------------------------------------------------
                      foreign |
                     Foreign  |   2088.167   2351.846     0.89   0.378     -2614.64    6790.974
                              |
                        rep78 |
                           2  |   1403.125   2378.422     0.59   0.557    -3352.823    6159.073
                           3  |   2042.574   2204.707     0.93   0.358    -2366.011    6451.159
                           4  |   1317.056   2351.846     0.56   0.578    -3385.751    6019.863
                           5  |       -360   3008.492    -0.12   0.905    -6375.851    5655.851
                              |
                foreign#rep78 |
                   Foreign#1  |          0  (empty)
                   Foreign#2  |          0  (empty)
                   Foreign#3  |  -3866.574   2980.505    -1.30   0.199    -9826.462    2093.314
                   Foreign#4  |  -1708.278   2746.365    -0.62   0.536    -7199.973    3783.418
                   Foreign#5  |          0  (omitted)
                              |
                        _cons |     4564.5   2127.325     2.15   0.036      310.651    8818.349
                -------------------------------------------------------------------------------
                
                . mat list e(b)
                
                e(b)[1,18]
                             0b.           1.          1b.           2.           3.           4.           5.  0b.foreign#  0b.foreign#  0b.foreign#
                        foreign      foreign        rep78        rep78        rep78        rep78        rep78     1b.rep78     2o.rep78     3o.rep78
                y1            0    2088.1667            0     1403.125    2042.5741    1317.0556         -360            0            0            0
                
                     0b.foreign#  0b.foreign#  1o.foreign#  1o.foreign#   1.foreign#   1.foreign#  1o.foreign#             
                       4o.rep78     5o.rep78     1b.rep78     2o.rep78      3.rep78      4.rep78     5o.rep78        _cons
                y1            0            0            0            0   -3866.5741   -1708.2778            0       4564.5
                
                . lincom(2.rep78+3.rep78)-1.foreign#3.rep78
                
                 ( 1)  2.rep78 + 3.rep78 - 1.foreign#3.rep78 = 0
                
                ------------------------------------------------------------------------------
                       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                -------------+----------------------------------------------------------------
                         (1) |   7312.273   5396.527     1.35   0.180    -3478.749     18103.3
                ------------------------------------------------------------------------------
                
                .
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Carlo:
                  I currently have the following:
                  Code:
                  reghdfe $ylist i.Renewable#c.ESG_Q TAT_Q TAG_Q TA_Q EBIT_Q ROA_Q FL_Q BTM_Q i.Fossilfuels#c.ESG_Q i.Insurance#c.ESG_Q i.Fish_Farm#c.ESG_Q i.FoodProc#c.ESG_Q, absorb(n_ID Quarter) vce(cluster n_ID)
                  test 1.Renewable#c.ESG_Q = ESG_Q
                  test 1.Fossilfuels#c.ESG_Q = ESG_Q 
                  test 1.Insurance#c.ESG_Q = ESG_Q 
                  test 1.Fish_Farm#c.ESG_Q = ESG_Q 
                  test 1.FoodProc#c.ESG_Q = ESG_Q
                  This gives me outcomes that seem logical. Only thing I am not understanding is my results itself. I thought that having a higher ESG rating (e.g. being 'better') would result in having higher stock returns. Until now, that has not been the case. The opposite is the case. if you have a higher ESG rating, you probably have lower stock returns (not statistically correctly speaking, but you know what I mean). I do have quite high kurtosis and skewness (3 and 10 I believe, or the opposite way around), So I will probably log some variables in order to get my data closer to normal.

                  Comment


                  • #10
                    Maarten:
                    logging or not, the issue is that you're focusing your attention to a very limited set of predictors, which coefficients are adjusted for the remaining ones.
                    In addition, without sharing your results, interested listers have hard times in replying to your query positively.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Carlo:
                      I do not fully understand your first remark. I tried to follow general literature on stock returns and tried to use the same controls and predictors. As for the second remark, in the future I will try to include my results

                      Comment


                      • #12
                        Maarten:
                        what you report is surely correct; my comment was simply that the coefficients you tested are adjusted for the other predictors (as in each and every regression).
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment

                        Working...
                        X