Interaction term in regression - Interpretation? Mispecification?

Pia Schmidt

Join Date: Jul 2018

Posts: 7
#1

Interaction term in regression - Interpretation? Mispecification?

30 Aug 2018, 08:59

Hello Statalist users,

I am struggling in interpreting the outcome of an interaction term.

Firstly, I want to find out if the decrease in quarterly reporting of a firm leads to a decrease in their analyst following, when mandatory quarterly reporting was repealed in Europe in 2015.
Therefore, I designed two binary variables, quarterlyreporting =1 if a firm is quarterly reporting and =0 otherwise and Zeitraum1617 if the observation of a firm is after 2015 = 1 and before 2015 = 0.

I also include three control variables, but their interpretations are clear, so I will not report them here.

I designed the following modells in Stata 14:

Code:

xtreg analystfollowing i.quarterlyreporting##i.Zeitraum1617

I expected quarterlyreporting to have a positive influence and Zeitraum1617 to have a negative influence on analystfollowing which is consistent with the outcomes of my regression. But the interaction terms turns out to be negative and I do not know how to interpret this or do not even know, whether the specification of the regression is the right one.. I also implemented fixed effects later, but I do not know, if the interaction term is correctly specified as I got a different outcome for the interaction when I add it as a new generated variable or just with one # between the variables.

I am really desperate of this regression and I really need it for my masterthesis, so please help me if you have any ideas to my problem.
I thank you in advance !
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

30 Aug 2018, 10:17

The command you show in #1 looks correct to me. I cannot comment on whether this is a reasonable model of the phenomena in question, but the code correctly expresses a random effects regression of the outcome analystfollowing on quarterly reporting, the Zeitraum1617 dichotomy, and their interaction.

It is not surprising that results would differ with a fixed-effects regression. Usually the differences are small, but they do not have to be. Even the signs can change! This is because -fe- is a within-entity estimator, whereas -re- is a mixture of within- and between-entity effects.

Specifying the command with just one # between the variables may is usually incorrect. If you wrote

Code:

xtreg analystfollowing i.quarterlyreporting#i.Zeitraum1617

that is a mis-specified model (whether -fe- or -re- or any other type of regression) and its results should be discarded. If you prefer to use the # notation over the ## notation, then you have to also spell out the constituents:

Code:

xtreg analystfollowing i.quaraterlyreporting i.Zeitraum1617 i.quarterlyreporting#i.Zeitraum1617

would be a properly specified model that you could use. It would also be exactly equivalent to the one you show in #1 that relies on the ## operator. Since the ## operator involves a lot less typing, I do not see any reason to use the # version.

You will probably get a better understanding of what your model is telling you if you follow it with the following commands (which, really, are the whole point of using factor variable notation):

Code:

margins quarterlyreporting#Zeitraum1617 margins Zeitraum1617, dydx(quarterlyreporting) margins quarterlyreporting, dydx(Zeitraum1617)

The statistic in the -xtreg- output you should be focusing on is the coefficient of 1.quarterlyreporting#1.Zeitraum1617. That is your difference-in-differences estimate of the effect of quarterly reporting on analystfollowing. Do not interpret the coefficients of quarterlyreporting and Zeitraum1617 themselves as effects of quaraterly reporting and change of era from pre- to post. That is a common mistake.

Turn your attention next to the output of the first -margins- command. These outputs show you the expected values of analystfollowing in both the quarterly reporting and non-quarterly reporting groups in both the pre- and post-2015 eras.

The output of the second -margins- command will show you the differences in analyst following between the quarterly reporting and non-reporting groups in each of the pre- and post- eras.

The output of the third -margins- command will show you the changes in analyst following from before to after 2015 in the quarterly reporting and non-reporting groups.

Those are the complete outputs of this analysis.

As far as the negative sign of the interaction term goes, if your expectations about the effects of quarterly reporting and the passage of year 2015 are correct, what you will see is that the changes in analyst following from before to after 2015 in both groups are negative, but the change in the quarterly reporting group is more negative. Similarly, the differences in analyst following in the quarterly reporting and non-quarterly reporting groups will be positive both pre- and post- 2015, but the post-2015 diference will be smaller.

Of course, it is also possible that your expectations about these things are wrong, or at least are not supported by your data.

If you need additional guidance interpreting your results, it would be best to post back showing the actual output. Be sure to use code delimiters so that the output aligns in a nice readable way. If you are not familiar with code delimiters, please read Forum FAQ #12 for instructions.

Added: If you are running a fixed effects model, you should expect Stata to warn you that the quarterlyreporting variable is omitted due to colinearity; that is not a problem. You may also find that the -margins- results show "not estimable." If that happens, at the -noestimcheck- option to the -margins- command.
1 like
Comment
Pia Schmidt

Join Date: Jul 2018

Posts: 7
#3

31 Aug 2018, 05:09

Thank you so so much for your extensive help, Clyde !
Comment

Pia Schmidt

Join Date: Jul 2018
Posts: 7

03 Sep 2018, 05:58

As you offered, I would be very grateful for some additional guidance in my interpretations.
EPS1NE is the number of analyst, therefore it is a proxy for analystfollowing

This is my first code:

Code:

xtreg EPS1NE i.quarterlyreporting##i.Zeitraum1617

and the output in Stata 14:

Code:

Random-effects GLS regression                   Number of obs     =     17,658
Group variable: i                               Number of groups  =      1,962

R-sq:                                           Obs per group:
     within  = 0.0112                                         min =          9
     between = 0.0050                                         avg =        9.0
     overall = 0.0002                                         max =          9

                                                Wald chi2(3)      =     170.10
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

-------------------------------------------------------------------------------------------------
                         EPS1NE |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
           1.quarterlyreporting |        ,29        ,07     4.08   0.000          ,15         ,43
                 1.Zeitraum1617 |       -,33       ,051    -6.42   0.000         -,42        -,23
                                |
quarterlyreporting#Zeitraum1617 |
                           1 1  |      -,066       ,067    -0.98   0.328          -,2        ,066
                                |
                          _cons |        3,7        ,14    27.01   0.000          3,5           4
--------------------------------+----------------------------------------------------------------
                        sigma_u |  5.7600026
                        sigma_e |   1.751542
                            rho |  .91535786   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------------------

I would interpret it like this:

First: The constant reports the value of expected analyst following if quarterlyreporting=0 and Zeitraum1617=0
quarterlyreporting is +0,29 and reports a positive difference for quarterlyreporting=0 to quarterlyreporting=1 if Zeitraum1617=0
Zeitraum1617 is -0,33 and reports a negative difference in Zeitraum1617=1 to Zeitraum1617=0 if quarterlyreporting=0

And, in consequence, the interaction term is the difference between quarterlyreporting=1 & ZEitraum1617=1 and quarterlyreporting=0 & ZEitraum1617=0?

If my interpretations are right, is it possible to say that if a firm changes from quarterlyreporting in Zeitraum1617=0 to no quarterlyreporting in Zeitraum1617=1, the analyst following would change by -0,33 -0,29 ? Because this is the main effect of my analysis.

The margins-command shows me:

Code:

 margins quarterlyreporting#Zeitraum1617, noestimcheck

Adjusted predictions                            Number of obs     =     17,658
Model VCE    : Conventional

Expression   : Linear prediction, predict()

-------------------------------------------------------------------------------------------------
                                |            Delta-method
                                |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
quarterlyreporting#Zeitraum1617 |
                           0 0  |        3,7        ,14    27.01   0.000          3,5           4
                           0 1  |        3,4        ,14    24.51   0.000          3,1         3,7
                           1 0  |          4        ,13    30.27   0.000          3,8         4,3
                           1 1  |        3,6        ,14    26.41   0.000          3,4         3,9
-------------------------------------------------------------------------------------------------

As the interactionterm is not significant in my xtreg output above, what does the P>|z| in the margins output tell me?

Thank you so much in advance!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#5

03 Sep 2018, 10:04

First: The constant reports the value of expected analyst following if quarterlyreporting=0 and Zeitraum1617=0
quarterlyreporting is +0,29 and reports a positive difference for quarterlyreporting=0 to quarterlyreporting=1 if Zeitraum1617=0
Zeitraum1617 is -0,33 and reports a negative difference in Zeitraum1617=1 to Zeitraum1617=0 if quarterlyreporting=0

These are correct.

And, in consequence, the interaction term is the difference between quarterlyreporting=1 & ZEitraum1617=1 and quarterlyreporting=0 & ZEitraum1617=0?

No. The interaction term is not the difference between any pair of grouped outcomes. It is a difference in differences. Specifically, it is the difference between (quartertlyreporting = 1 & Zeitraum1617 = 1 - quarterlyreporting = 1 & Zeitraum = 0) and (quarterlyreporting = 0 & Zeitraum = 1 - quarterlyreporting = 0 & Zeitraum = 0). It is the difference in differences. So it reflects the "pure" effect of quarterly reporting on the outcome because it is the difference made by quarterly reporting with the difference that is just due to the Zeitraum1617 timepoint passing subtracted out. It is called the difference in differences (DID) estimate of the effect.

is it possible to say that if a firm changes from quarterlyreporting in Zeitraum1617=0 to no quarterlyreporting in Zeitraum1617=1, the analyst following would change by -0,33 -0,29 ? Because this is the main effect of my analysis.

No. In fact, you can't really say anything about that. In your design, assuming you have created the variables in the data correctly, the quarterlyreporting variable can never change over time within a given firm. (If that is not the case in your data, then your data is not correct for this analysis.) The quarterlyreporting variable should take on the value 1 in the group that, after Zeitraum1617, does quarterly reporting but it is still 1 in that group during its pre-Zeitraum1617 observations. Similarly it takes on the value 0 in the group that after Zeitraum1617 does not do quarterly reporting, and it remains 0 in the pre-Zeitraum1617 observations for those firms. Thus your data should have no firms meeting the description of changing from quarterlyreporting in Zeitraum1617 to no quarterly reporting in Zeitraum1617 = 1--only in the opposite direction. While you might want to assume that this change would just be the same value with opposite sign, that would be a pure act of faith, not data analysis.

As the interactionterm is not significant in my xtreg output above, what does the P>|z| in the margins output tell me?

In a word, nothing. I wish StataCorp would eliminate those from the Adjusted predictions output (or, rather, suppress them unless explicitly requested by the user through an option.) Those p-values test the hypotheses that the expected values analyst following in those conditions are zero. That is usually a strawman hypothesis. In your situation it is obviously a straw man hypothesis given the nature of your outcome (analyst following). So you should just ignore those. Only the margin and standard error columns are meaningful here. Had you run the -margins, dydx()- commands shown in #2, the p-values shown there are actually tests of null hypotheses that some might consider meaningful and might be worth looking at. But not these.

By the way, you probably should run those -margins, dydx()- commands: they give you directly the differences between after Zeitraum1617 and before (-margins, dydx(Zeitraum1617)-), and the differences between the quarterly reporting and no quarterly reporting groups both before and after Zeitraum1617 (-margins, dydx(quarterlyreporting)-).
2 likes
Comment
Pia Schmidt

Join Date: Jul 2018

Posts: 7
#6

04 Sep 2018, 04:51

My data are firm-year-specific observations, so every observation for one year of one firm reports an own value of quarterlyreporting. So quarterlyreporting is not constant over time of a firm.

And yes, I run all of the margins-commands and they are a great help for understanding the data. Again, thank you very much. You are such a great help.
Comment

Announcement