Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • moderator omitted because of collinearity

    Dear all,

    I'm writing my thesis for my Master's degree. I'm using a fixed effects model to assert the relationship between Corporate Social Performance (CSP) and Corporate Financial Performance (CFP, measured as ROE) and how it is moderated by a country's long-term orientation(LTO). (control variables SIZE and RISK)

    However, as I'm doing the regression in STATA 14.0, it automatically deletes the moderating variable (I've added both the independent, moderating variable and the interaction effect in the regression). Unfortunately I can't install dataex at my university's computer, but I'll try to copy the results:


    Click image for larger version

Name:	Untitled.png
Views:	1
Size:	22.2 KB
ID:	1423730


    As you can see, standardizing didn't help the fact that the moderator was still omitted because of collinearity. I've tried to google it for so long but I don't know what to do anymore.
    Thank you very much in advance for your time.

    Kind regards,
    Tom Bosse
    Attached Files

  • #2
    There is nothing wrong. This is exactly what should happen. Your variable LTO is a time-invariant attribute of each firm (or whatever your unit of analysis is. Any variable that is time-invariant will, in a fixed effects model, be colinear with the fixed effects. Indeed, it is precisely this property of fixed effects that enables them to automatically remove unobserved variable bias attributable to time-invariant properties of the analytic units.

    Now, you also have to think about how an interaction model works. The meaning of the coefficient of LTO in this interaction model, if it were not omitted due to colinearity, would be the expected difference in expected CFP between firms with and without LTO conditional on CSP = 0. Now, that condition, CSP = 0, may not even be possible, depending on how CSP is calculated. Even if it is possible, it may or may not have any meaning or interest. The most that one might extract from a coefficient of LTO (were it possible to get one) is a basis of comparison for the interaction coefficient itself. But this is not terribly important, because the coefficient of CSP can also serve as a basis of comparison (it is the expected difference in CFP associated with a unit increase in CSP conditional on LTO = 0.)

    So this is a non-problem. Focus on interpreting the interaction, which is the DID estimator of the moderation effect. It may be that you are interested also in expected CFP associated with various values of CSP and LTO. You can do this, but it is more complicated than it needs to be because you did not use factor-variable notation in your regression. So I would re-run this regression as:

    Code:
    xtreg ROE c.CSP##i.LTO SIZE RISK, fe
    You will still find that LTO is omitted due to colinearity. Again, this is not a problem. Then you can get predicted values and average marginal effects by commands like:

    Code:
    margins LTO, at(CSP = (list_of_interesting_values_of_CSP)) // PREDICTED VALUES
    margins, dydx(LTO) // AVERAGE MARGINAL EFFECT OF LTO
    margins, dydx(LTO) at(CSP = (list_of_interesting_values_of_CSP)) // MARGINAL EFFECTS AT INTERESTING VALUES
    Note: I have assumed that CSP is a continuous variable and LTO is a dichotomous 0/1 variable. Change the code accordingly if this is not true. Continuous variables need a c. prefix, and interaction variables should have (but don't absolutely have to have) an i. prefix. Do read -help fvvarlist- for details about factor variable notation. It is one of modern Stata's best features, as it saves a ton of work and prevents a lot of mistakes by enabling the use of the -margins- command for post-estimation tasks. You probably would also want to read about the use of the -margins- command in the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. There you will find a crystal clear explanation of -margins- and detailed examples of application to problems just like yours. When you have absorbed that, you can learn about more advanced features of -margins- by reading the PDF documentation on that command.

    Added: As an aside, standardizing variables does not solve colinearity problems. If the original variables are colinear, the standardized versions will be as well, as they are just linear functions of the original variables.
    Last edited by Clyde Schechter; 26 Dec 2017, 09:31.

    Comment


    • #3
      Dear Mr. Schechter,

      Thank you so much for your quick reply.
      Both my independent variable (CSP) as my moderator (LTO) have a score between 0 - 100. Where a higher score for CSP means a higher social score and a higher score for LTO means more a long-term orientation than a short term orientation. Therefore my guess is that they are both continuous variables?
      Should I change the code to:


      xtreg ROE c.CSP##c.LTO SIZE RISK, fe

      Attached you can see the new regression. I'm not getting the part of the margins though. Can I interpret this results already?
      My hypothesis are:
      1: CSP has a positive influence on CFP (this is obviously wrong, because it is negative..)
      2: A country’s long-term (short) orientation amplifies (weakens) the relationship between CSP and
      CFP (this is proven by the results, right?)

      Thank you so much again for your time, you have no idea how helpful and lifesaving this advice is.

      Kind regards,
      Tom Bosse
      Attached Files

      Comment


      • #4
        Because LTO is a continuous variable, the syntax I gave for the -margins- commands is not appropriate. Also, when I put "list_of_interesting_values_of_CSP" in the code, I did not mean for you to type that literally: I intended for you to choose some interesting values of the CSP variable and place that list there. You will have to do the same for LTO in this case. Let's say, for the sake of illustration that interesting values of LTO are 25, 50, and 75, and that interesting values of CSP are 0, 20, 40, 60, 80, and 100. Then the syntax would be:

        Code:
        margins, at(LTO = (25 50 75) CSP = (0(20)100))
        marginsplot
        margins, dydx(LTO) at(LTO = (25 50 75) CSP = (0(20)100)) // MARGINAL EFFECTS OF LTO AT INTERESTING VALUES
        margins, dydx(CSP) at(LTO = (25 50 75) CSP = (0(20)100)) // MARGINAL EFFECTS OF CSP AT INTERESTING VALUES
        margins, dydx(CSP) at(CSP = (0(20)100)) // MARGINAL EFFECTS OF CSP AVERAGED OVER LTO
        margins, dydx(LTO) at(LTO = (25 50 75)) // MARGINAL EFFECTS OF LTO AVERAGED OVER CSP
        I added the -marginsplot- command in there, because it is really hard to interpret and understand interactions between continuous variables, and graphing the results makes it much, much clearer what is going on. You can also run -marginsplot- after each of the -margins, dydx(...- commands, and I think, there, too, it makes it easier to see what is happening.

        As for your interpretations of your hypotheses:

        1. Incorrect. That negative coefficient CSP only describes the relationship between CSP and ROE when LTO = 0, which may or may not be of any interest. Even if it is of interest, it is by no means the whole story about the CSP-ROE relationship. LTO can be at most 100, so when LTO = 100, the net coefficient for CSP becomes -1.468194 + 100*0.0196405 = 0.495856, which is clearly a positive number. This sort of thing is characteristic of interaction models: for sufficiently low values of CSP the effect of CSP will be negative, but for sufficiently large ones it will be positive. Here we see that over the 0-100 range, the CSP effect varies between -1.468194 and +0.495856. A little algebra shows that CSP = 74.753392 is the crossover point. That is, when CSP < 74.753392, the effect of CSP is negative, but when CSP > 74.753392 it is positive. (And when CSP = 74.753392, the effect of CSP is zero.)

        2. Also incorrect. The interaction coefficient, 0.0196405 is positive. So the larger LTO is, the less negative or more positive CSP's effect is. If you look at the graphs coming out of -marginsplot- after the -margins, dydx()- commands this will be very clear.

        It is really treacherous to try to interpret the output of interaction regressions directly unles you are experienced at doing this and good at algebra. It is much better (easier and avoids mistakes) to work with the output of -margins-, and, in the case of continuous by continuous interactions, as here, better still to graph it all. Really, the only thing that you should interpret directly from the regression output itself is the statistical significance of the interaction term coefficient, assuming you are interested in its statistical significance at all. Everything else in the regression output is obscure and does not mean what you probably think it means.
        Last edited by Clyde Schechter; 26 Dec 2017, 11:07.

        Comment


        • #5
          Dear Mr. Schechter,

          Again many thanks for your reply.
          I have to say that I've lost you already.. If I follow the code for the margins, the marginsplot doesn't show anything...

          The reason why I find support for my first statement (Reject hypothesis 1 because I find evidence for a NEGATIVE relationship):

          Click image for larger version

Name:	Untitled 3.png
Views:	1
Size:	18.4 KB
ID:	1423739



          When I was checking for my 2nd hypothesis and by adding the interaction effect, I've noticed that STATA omitted my normal LTO score (not the interaction effect).
          I'm still not sure what to do and how to interpret the results.

          Thank you again in advising such an amateur, I appreciate the effort very much.

          Kind regards,
          Tom Bosse

          Comment


          • #6
            I guess it could also be that the LTO score is omitted because the score per company doesn't differ over years (it is a Hofstede's cultural dimension).
            Where the CSP score differs per year, thus the interaction effect as well (CSP*LTO), the LTO score doesn't vary per company over year

            Comment


            • #7
              Well, the best you can say for the non-interaction model you show in #5 is that, take on its own terms, you have interpreted it correctly, and if there were a unique effect of CSP that did not vary according to LTO, this would be a reasonable estimate of it. But you have a hypothesis that the CSP effect is moderated by LTO, then it is inappropriate to use a non-interaction model to estimate anything about CSP effect. If you had found null results for the interaction term in the interaction model, then you would be quite justified in going back to a non-interaction model.

              But you didn't: you got an interaction term that is large enough to swing the effect of CSP from strongly negative to pretty strongly positive over the range of LTO, and, if you believe in null hypothesis significance testing, it's highly statistically significant, too. So I would just say that the non-interaction model really isn't valid for this data and shouldn't be interpreted at all.

              I have never had a problem with -marginsplot- producing no output, so I can only assume that you did something wrong. But without seeing your actual code, and the output of the -margins- command that immediately preceded -marginsplot-, I can't give more specific advice. In addition, seeing the -margins- outputs I will be happy to show you how they are interpreted.

              But, really, I do advise you to read https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It won't take long to read, it is very clearly written, and it's very much on point for the kind of problem you are working on.

              Added: Crossed with #6. Yes, what you say in #6 is correct. That is precisely the point I was trying to make in the first paragraph of #2.

              Comment


              • #8
                I tried everything and followed https://www3.nd.edu/~rwilliam/stats/Margins01.pdf but I still not manage to get the margins right in STATA and to plot it into a graph. Attached you can see the result.


                Edit: I've used the exact code of #4 (minus the //)

                The only guess I can make is that both LTO as CSP are numbers within 0-100, varying from rounded numbers LTO to unrounded numbers CSP (3 decimals)
                Attached Files
                Last edited by Tom Bosse; 28 Dec 2017, 08:28.

                Comment


                • #9
                  The problem is not with -marginsplot-, but with -margins-. This often crops up when -margins- is applied after a fixed effects regression where a predictor of interest is colinear with the fixed effects. -marginsplot- is producing an empty graph because -margins- has not produced any numbers for it to plot. The problem is fixable by adding the -noestimcheck- option to the -margins- command:

                  Code:
                  margins, at(LTO = (25 50 75) CSP = (0(20)100)) noestimcheck
                  You will need this option for the other -margins- commands here as well.

                  The -noestimcheck- option should not be used indiscriminately whenever -margins- says things are not estimable. But it is safe to use it in this situation where you are taking margins over an interaction involving a variable that is omitted due to colinearity with the fixed effects. Re-run -margins- this way and you will get results, which -marginsplot- will then graph for you.

                  By the way, -marginsplot- accepts pretty much all graphing options available in -graph twoway-, so if you don't like the look of the graph as it is first created for you, you can customize it to your preferences.

                  Comment


                  • #10
                    Incredible! noestimcheck solved the problem:
                    Attached Files

                    Comment


                    • #11
                      Originally posted by Tom Bosse View Post
                      Incredible! noestimcheck solved the problem:
                      Now lastly, interpretation is of vital importance for my results. Looking to your previous comment of #7, I think I know more or less how to interpret the results for my hypothesis:

                      H1: CSP has a positive influence on CFP
                      H2: A country’s long-term (short) orientation amplifies (weakens) the relationship between CSP and CFP

                      By doing the regression (xtreg) without the moderator, I get a negative significant results for H1, instead of the hypothesized positive results.
                      However, when I add the moderator and the interaction term (even though the moderator is omitted because collinearity, because it is time-invariant, but this is not a problem) the effect of CSP on ROE is only NEGATIVE significant until a certain point CSP <74.753392. Because this effect is also significant, looking to the first regression without the moderator makes no sense, and therefore, I should look to the following regression, for BOTH my hypotheses:
                      Attached Files

                      Comment


                      • #12
                        I agree with your interpretations, except for #1. You cannot really characterize the effect of CSP on CFP as either positive or negative, because it can be either, depending on the value of LTO. For LTO < about 75, the CSP effect is negative, and for LTO > about 75 it is positive. Now, if the distribution of LTO in your data is such that nearly all the values of LTO are > 75, and the ones < 75 are just a few stragglers/outliers, then you could point out that, in practice, this means that the effect of CSP on CFP is positive in nearly all situations. But if 75 is more or less central in the distribution of LTO, then you really can't characterize it as one way or the other. The correct interpretation, and, in my view, a far more interesting answer, is "it depends on LTO."

                        Comment


                        • #13
                          Regarding hypothesis 1:
                          As only 50 of 750 (6.7% of the data) observations have a >74 LTO score, I guess it is fair to say that CSP has a NEGATIVE impact on CFP, with the exceptions of countries that have a LTO score >74 ?

                          Thus, I cannot accept hypothesis 1 as I find a NEGATIVE instead of a positive relationship. (is it smart to show the first regression without the moderator and interaction effect to show that CSP has a negative significant impact (see #5) or doesn't that add value? As I have to describe all the results in a chapter, my guess is that it was indeed a useful regression, UNTIL I found that the interaction effect had a significant impact on the relationship between CSP and CFP)


                          Regarding hypothesis 2:
                          Even though I found a negative relationship between CSP and CFP, I did find evidence that the relationship between CSP and CFP is amplified (weakened) by a high (low) LTO, right?
                          Lastly, just to double check: the attached file of #11 is basically my whole outcome for my thesis, right? Just to be 100% sure, it is not a problem that the LTO score is omitted.
                          Attached Files

                          Comment


                          • #14
                            (is it smart to show the first regression without the moderator and interaction effect to show that CSP has a negative significant impact (see #5) or doesn't that add value?
                            I wouldn't show it. It is a mis-specified model: the interaction term has a very strong effect and the model without it is, at best elliptical, and at worst misleading.

                            Even though I found a negative relationship between CSP and CFP, I did find evidence that the relationship between CSP and CFP is amplified (weakened) by a high (low) LTO, right?
                            I don't like using words like "amplified" and "weakened" because their meaning is unclear. If you start out with a negative relationship between CSP and CFP at a level of, say LTO = 45. Then you go up to LTO = 50 and the relationship is now less negative, but still not positive. Has the relationship been "weakened" or "strengthened"? Some people would say that it is "weakened" because the magnitude of the association is smaller. Others would say that it is "strengthened" because as a numerical measure it has increased. So I tend to avoid those words and just describe things using words with unambiguous meanings. I would say that the relationship between CSR and CFP is negative for most of the distribution of LTO, and that as LTO increases the marginal effect of CSR increases towards zero, passing through zero into positive territory at LTO = about 75, and then becomes larger still as LTO increase to the largest observed values.

                            Lastly, just to double check: the attached file of #11 is basically my whole outcome for my thesis, right?
                            It is the primary result of your primary analysis. And the full story ultimately stems from it. But for readers/reviewers who are not experienced at working with interaction models, it is difficult to perceive in that table the relationships among CSR, LTO and CFP that are more apparent in the -marginsplot- graph. It requires doing a lot of mental calculation. So as a matter of clarity, I would also show some version of that graph, or the -margins- output table to drive home the point. Actually, I would probably rerun -marginsplot- specifying the -xdimension(CSP)- option. The graph you have shows the CSP effect as different lines in the plot, and the LTO effect on the horizontal axis. It seems that you have a clear conceptualization that CSP is the effect of interest and LTO is the moderator (not the other way around), so that having CSP on the horizontal axis would better reflect the way you think about it. With -xdimension(CSP)- you'll have a separate line for each value of LTO, and CSP will be on the horizontal axis. Then it will be easy to spot that the lines corresponding to low LTO values slope down to the right whereas those corresponding to LTO values > about 7t slope up to the right. I think that's easier to understand than the graph as currently drawn.

                            Moreover, I can only say that it is the essence of your study of the relationships among CSR, LTO and CFP. If your thesis project had additional goals not discussed in this thread, then there may be more that you need to show in relation to those.

                            Just to be 100% sure, it is not a problem that the LTO score is omitted
                            Correct, it is not a problem statistically. It may be a problem in terms of explaining to a non-technical audience. This is another reason why I think that reporting the -margins- output or the -marginsplot- graph is helpful, because it shows that even though LTO's coefficient is omitted from the regression output, LTO's effects are very much reflected in the regression model.

                            Comment

                            Working...
                            X