Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Change over time

    Dear Statalist,

    I don't know if this question has been asked before. I, at least, couldn't find it.

    I am currently analyzing the effects of ESG ratings on stock returns. One of the goals of my study is to figure out if, during a timespan running from 2003 until 2018, the effect of ESG on stock returns has changed. I am unsure how I should tackle this problem. I have divided my sample in multiple parts by means of a dummy for which American president was in office; e.g. Bush = 1 if Bush was in office during time t. Obama = 1 if Obama was in office and Trump = 1 if Trump was in office.

    My question is, how should I model my regression. I currently. have an panel regression (xtreg) with fixed effects and firms are clustered by TRBC business sector (it is the refinitiv business classifier, which contains 29 business sectors). I was personally thinking of either just adding the president variables into my original regression, and after I ran it, just performing an F-test on the presidents:
    test Bush = Obama = Trump.

    I was also thinking of working with interaction effects:

    xtreg lnreturn $normal regression ESG_Bush ESG_Obama ESG_Trump, fe cluster(busseccode).
    test ESG_Bush = ESG_Obama = ESG_Trump.

    If I choose for the interaction effects, should I also include the normal president variables?

    And then finally, in my other regressions I also had a time dummy which removed time-fixed effects, i.Quarter. Should I also include that in this regression? My own intuition was that I should not include it, as then the interaction effects and president variables show purely the effect of that president, and not the fact that they are not in the same time period. When I ran the testparm i.Quarter on the variable, it gave me significant results meaning that I should include it, but my intuition tells me not to.

    I know that including the presidents does give some inconsistency, as for example during Obama there was the big crisis, but that is something I am willing to accept as I did not know how to correct for this.

    I am hoping someone can either point me in the right direction by either answering my questions or referring to another thread in which this problem was discussed. Kind regards, Maarten Loomans.

  • #2
    Maarten:
    some comments about your query:
    - why creating categorical variables by hand when -fvvarlist- notation can do then for you? In addition, your goal should be to have one three-level categorical variable only named -president- (Bush=0; Obama=1; Trump=2); see also -label-;
    . this way, you possible interaction will be:
    Code:
    c.ESG##i.president
    - not sure I got you right about "time dummy which removed time-fixed effects". That said, please note that -fe- usually wants -timevar- among predictors;
    - clustering your standard errors with 27 panels only may produce biased results. I would recommend to compare them with theis default conterparts.

    As an aside, as per FAQ posting what you tyoed and what Stata gave you back (via CODE delimiters, please) can increase your chances of getting (more) helpful replies. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Maarten:
      some comments about your query:
      - why creating categorical variables by hand when -fvvarlist- notation can do then for you? In addition, your goal should be to have one three-level categorical variable only named -president- (Bush=0; Obama=1; Trump=2); see also -label-;
      . this way, you possible interaction will be:
      Code:
      c.ESG##i.president
      - not sure I got you right about "time dummy which removed time-fixed effects". That said, please note that -fe- usually wants -timevar- among predictors;
      - clustering your standard errors with 27 panels only may produce biased results. I would recommend to compare them with theis default conterparts.

      As an aside, as per FAQ posting what you tyoed and what Stata gave you back (via CODE delimiters, please) can increase your chances of getting (more) helpful replies. Thanks.
      Carlo:
      Thanks for your reply! I will work my way through your points.
      1. As for creating categorical variables by hand:
      - I did not know that I could do this by fvvarlist. However, since there are only three and I do not think that it will make a substantial difference, I will leave it how it is currently. However, next time, I will make use of the fvvarlist.

      2. As for time fixed effect.
      - please explain to me what you do not understand about the time fixed effect. By integrating the quarter dummy (my data is measured quarterly), I control for unexpected variation. You said that -fe- usually wants -timevar- among predictors. Could you please delve more into this, as I am unsure what you mean.

      3. as for the clustering
      - I will indeed have not a lot of clusters, however, my dataset consists of the russel3000, which after modifying, will have a lot of observations per cluster. This removes any bias if I am correct.

      4. as for writing my code and output
      - I know that that is how it is usually done on this forum. I, however, currently am in a predicament where I can only access my dofile and not my data, such that I cannot show you what the output is in Stata.

      and I want to end with a question to you to get to my main question:
      I want to see if the effects of ESG on stock returns change over time. Is integrating an interaction effect into my model a good idea for that? And if so, should I only include the interaction effects, or also the president dummies without any interaction effect in my model? Or is a multivariatte Arma or Garch a better approach?

      Kind regards,
      Maarten Loomans.

      Comment


      • #4
        Maarten:
        1) you're obviously welcome, but your way of creating categorical variables is not only inefficient (which may well be your choice), but, much more substantively, makes the use of -margins- and -marginsplot- unfeasible;
        2) there's nothing I do not understand about -i.time- bar the way you phrased it. That said, if you have -i.quarter- plug it in the right-hand side of your regression equation;
        3) what is relevant for clustered standard error is the number of clusters. Under -xtreg- cluster-robust standard errors take both heteroskedasticity and autocorrelation into account: have you laready checked these issues?;
        4) you can interact -ESG- with -i.president-. The interaction is more difficult to create if you mantain your 3 categorical variables about US presidents.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo:
          Thanks for your response. I am indeed plugging in i.Quarter into the right-hand side of the regression. Should I also do this when analyzing a change over time?
          I know that I need heteroskedastic robust SEs since I first did a Hausman test, then I did Mundlaks approach. They both came to the conclusion that I should use robust SE. I am planning on checking with xtoverid. I do not exactly understand what you mean with a problem regarding Autocorrelation. How should I check for this? I was taught on it, but as of yet, I have not seen anyone really check for it.

          regarding the last part. Could I not just do an f-test? eg: ESG_Bush=ESG_Obama=ESG_Trump. anyhow, I will look into creating that fvvarlist tonight. Thank you!

          Comment


          • #6
            Maarten:
            1) under -fe- -i.timevar- will give you the contribution of time (as a categorical variable) to within-panel variatiion (when adjusted for the other predictors) of the regressand;
            2) -hausman- test does not support non-default standard errors (unlike the Mundlak approach). Hence, I not clear how the heteroskedasticity issue creeped up with your data. That said, please note that, being a bit old-fashioned, the community-contributed module -xtoverid- does not support -fvvarlist- notation. It is also important to stress that the null of -xtoverid- is that -re- is the way to go (something that you can easily test via the "handmade" version on the Mundlak correction);
            3) witthin-panel autocorrelation of the epsilon error is taken into account by cluster-robust standarda errors once invokedto deal with heteroskedasticity (search for the community-contributed module -xtserial- anyway);
            4) I wrongly assumed that you have 27 panels; at least 30 (others say 50) is probably more reassuring as far as cluster-robust standard errors are concerned (but what above about heteroskedasticity still holds);
            5) regarding the last part, with an unique three-level categorical variable you can test its joint statistical significance (via F-test) just typing -testparm- (an useful postestimation command indeed).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Maarten:
              1) under -fe- -i.timevar- will give you the contribution of time (as a categorical variable) to within-panel variatiion (when adjusted for the other predictors) of the regressand;
              2) -hausman- test does not support non-default standard errors (unlike the Mundlak approach). Hence, I not clear how the heteroskedasticity issue creeped up with your data. That said, please note that, being a bit old-fashioned, the community-contributed module -xtoverid- does not support -fvvarlist- notation. It is also important to stress that the null of -xtoverid- is that -re- is the way to go (something that you can easily test via the "handmade" version on the Mundlak correction);
              3) witthin-panel autocorrelation of the epsilon error is taken into account by cluster-robust standarda errors once invokedto deal with heteroskedasticity (search for the community-contributed module -xtserial- anyway);
              4) I wrongly assumed that you have 27 panels; at least 30 (others say 50) is probably more reassuring as far as cluster-robust standard errors are concerned (but what above about heteroskedasticity still holds);
              5) regarding the last part, with an unique three-level categorical variable you can test its joint statistical significance (via F-test) just typing -testparm- (an useful postestimation command indeed).
              Carlo:
              Thank you for your elaborate response.
              I have a few more questions.
              1. When testing if the effect (or coefficient) of my var of interest changes over time, should I include i.timevar? or Not?
              2.There is not a real heteroskedasticity problem. When I perform the Mundlak approach (i.e. running a regression with the vars and the means of the time-varying vars, then doing an F-test on those means), I found out that I should use Heteroskedastic robust SEs.
              3. Could you please explain how I should use the xt-serial command. I already looked into it, but didn't really find anything that would be helpful. Perhaps I missed something.
              4. regarding the clustering of SEs. I keep seeing people saying that you should use at least thirty clusters. However, when I read this well-cited paper :

              Arceneaux, K., & Nickerson, D. W. (2009). Modeling certainty with clustered data: A comparison of methods. Political analysis, 17(2), 177-190.

              I found out that twenty clusters is the minimum. What are the downsides to clustering on an individual level versus clustering on an industry level? I am currently clustering on an industry level.

              5. I know this is a dumb question, or at least a question that can easily be found, but as it is late and I want this done and over with I can't seem to find it, because as you know, when you are looking for something you cannot find it. How would I code the three presidents into a single variable in which Bush equals 1, Obama equals 2 and Trump equals 3?

              Again, I want to thank you for your amazing help. My current supervisor has helped me less than you have, and for that I want to thank you.

              Comment


              • #8
                Maarten:
                1) yes, you may want to interact -timevar- with the variable that is expected to lead the change of the regressand;
                2) if you have neither heteroskedasticity issues, nor autocorrelatoon problems, stick wiyh default standard errors. That said, I'm still unclear with the relationship between Mundlak and non-default standard errors;
                3) I would recommend you to take a look at the -xtserial- helpfile;
                4) provided that you go -fe- and -i.industry- is, as it frequently happens, a time-invariant predictor, you 'll get no coefficient at all. Clustering at -i.industry- level is, in general, less adisable, because each industry includes more panels and, as such, may be less fine-grained than clustering at -panelid- level. The minimum # of clusters is a matter of tribal rules, but less than 30 is probably hazardous;
                5) assuming that you already have a -string- variable named -president- can go as follows:
                Code:
                destring president, gen(president_num)
                Then you can plug the predictor -i.president_num- in the right-hand side of your regression equation.

                I do hope this helps.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Maarten:
                  1) yes, you may want to interact -timevar- with the variable that is expected to lead the change of the regressand;
                  2) if you have neither heteroskedasticity issues, nor autocorrelatoon problems, stick wiyh default standard errors. That said, I'm still unclear with the relationship between Mundlak and non-default standard errors;
                  3) I would recommend you to take a look at the -xtserial- helpfile;
                  4) provided that you go -fe- and -i.industry- is, as it frequently happens, a time-invariant predictor, you 'll get no coefficient at all. Clustering at -i.industry- level is, in general, less adisable, because each industry includes more panels and, as such, may be less fine-grained than clustering at -panelid- level. The minimum # of clusters is a matter of tribal rules, but less than 30 is probably hazardous;
                  5) assuming that you already have a -string- variable named -president- can go as follows:
                  Code:
                  destring president, gen(president_num)
                  Then you can plug the predictor -i.president_num- in the right-hand side of your regression equation.

                  I do hope this helps.
                  Carlo:
                  Thank you for your answers.
                  Should I still include i.Quarter if I am not planning on using 1.quarter to show the change over time? I was thinking of using the presidents to see the change over time.
                  as for Mundlak with robust SE, please check #10 of : https://www.statalist.org/forums/for...an-test-result and https://blog.stata.com/2015/10/29/fi...dlak-approach/. I cannot use the Hausman approach if I think that I am going to use robust SE. Thus to determine which model to use, re or fe, I did a Mundlak approach.

                  Thank you for the other help.

                  Comment


                  • #10
                    Maarten:
                    1) I'd still include -i.Quarter- without interaction and OK for -i.president- alone if you do not think that interaction with -i.timevar- is relevant (something that should be checked, though);
                    2) if you need non-default standard errors (which are not mandatory in Mundlak or the community-contributed module -xtoverid-) Mundlak as per https://blog.stata.com/2015/10/29/fi...dlak-approach/ is the way to go (I'm under the impression that cluster-robust standard errors were used there to stress the difference with -hausman-).
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Originally posted by Carlo Lazzaro View Post
                      Maarten:
                      1) I'd still include -i.Quarter- without interaction and OK for -i.president- alone if you do not think that interaction with -i.timevar- is relevant (something that should be checked, though);
                      2) if you need non-default standard errors (which are not mandatory in Mundlak or the community-contributed module -xtoverid-) Mundlak as per https://blog.stata.com/2015/10/29/fi...dlak-approach/ is the way to go (I'm under the impression that cluster-robust standard errors were used there to stress the difference with -hausman-).
                      Carlo:
                      I am currently trying to make the categorical variable in Stata, but it just isn't working. I have one variable that equals "Bush" if Bush was in office, "Obama" if Obama was in office and "Trump" if Trump was in office, but I cannot make a categorical var out of it.
                      Last edited by Maarten Loomans; 22 Jun 2022, 01:59.

                      Comment


                      • #12
                        Maarten:
                        you may want to try:
                        Code:
                        . set obs 3
                        Number of observations (_N) was 0, now 3.
                        
                        . g president_string="Bush" in 1
                        
                        
                        . replace president_string="Obama" in 2
                        
                        
                        . replace president_string="Trump" in 3
                        
                        
                        . encode president_string, g(num_president_string)
                        
                        . list
                        
                             +---------------------+
                             | presid~g   num_pr~g |
                             |---------------------|
                          1. |     Bush       Bush |
                          2. |    Obama      Obama |
                          3. |    Trump      Trump |
                             +---------------------+
                        
                        .
                        I would recommend you to take a look at -encode- entry in Stata .pdf manual, as it may behave nastily in some instances (different from the one you're interested in).
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Originally posted by Carlo Lazzaro View Post
                          Maarten:
                          you may want to try:
                          Code:
                          . set obs 3
                          Number of observations (_N) was 0, now 3.
                          
                          . g president_string="Bush" in 1
                          
                          
                          . replace president_string="Obama" in 2
                          
                          
                          . replace president_string="Trump" in 3
                          
                          
                          . encode president_string, g(num_president_string)
                          
                          . list
                          
                          +---------------------+
                          | presid~g num_pr~g |
                          |---------------------|
                          1. | Bush Bush |
                          2. | Obama Obama |
                          3. | Trump Trump |
                          +---------------------+
                          
                          .
                          I would recommend you to take a look at -encode- entry in Stata .pdf manual, as it may behave nastily in some instances (different from the one you're interested in).
                          Carlo:
                          I currently have the following:
                          Code:
                          by ID: gen president = "Bush" if Quarter <= 24
                          by ID: replace president = "Obama" if Quarter > 24
                          by ID: replace president = "Trump" if Quarter > 56
                          encode president, gen(president_n)

                          Comment

                          Working...
                          X