Mediation analysis in panel dataset

Sebastian Eiblmeier

Join Date: Mar 2017

Posts: 16
#1

Mediation analysis in panel dataset

10 Sep 2025, 05:10

Hello,

this is a follow-up to this older thread: https://www.statalist.org/forums/for...sis-panel-data

Let me quickly describe my data. Since it's mostly confidential, I cannot provide any examples.

I have firm-level and industry-level data at an annual frequency from 1999 to 2019. My hypothesis is that business sentiment determines firm debt via investment. I.e. if sentiment is more positive, firms invest more and hence take in more debt. Hence I have the following setup:

Treatment variable: sentiment, this is at industry level
Mediator variable: investment, this is at firm level
Outcome variable: debt growth, this is at firm level

The first possible problem is that the treatment is at group level. How problematic is this? In the R mediation package, this can be explicitly considered. In Stata's mediate it cannot - unless I oversaw it.

The second possible problem is that my firm panel is unbalanced. The industry-level data is there every year, the firm data is not. Will this be a problem in mediation analysis?

The third and biggest problem is that it is panel data. Back in the old thread, Clyde Schechter wrote:

Originally posted by Clyde Schechter View Post

This problem becomes thornier still when mediation is considered. The within-panel effects of a variable can mediate both the within- and between-panel relationships of another explanatory variable and an outcome. And it may do that to different degrees, and even in opposite directions. The same is true of the between-panel effects of each candidate mediator. If you then throw in additional mediators, the number of potential causal paths among these variables grows even larger. In fact it grows combinatorially in the number of mediators. Even if you have a data set large enough to enable you to reasonably estimate coefficients along all these paths, just reading and understanding the output is a problem that scales poorly with the number of variables involved.

(highlighted by me)

So the highlighted part means in my specific case: if a firm increases its investment (within effect) and hence increases its debt (within effect), this can be because it became more optimistic compared to last year (within effect) or compared to another firm (between effect), right?

Now I'm in the fortunate position that any cross-sectional effects are of no interest to me anyway. One could, of course, argue that a pure cross-sectional regression also reveals the true causal effect (given it is there): a firm that is more optimistic at any given point in time should invest more than its less optimistic peer. However, any positive relationship one might find could be biased due to structural differences between firms and/or industries. So the prudent course of action seems to be to rely only on the longitudinal variance for inference.

What I ultimately would like to know, again following Clyde Schechter's remarks, is:

Originally posted by Clyde Schechter View Post

Well, I think it is problematic. When you use a two-way fixed effects model, the effects you are estimating are a weighted average of the within- and between-panel effects. See https://journals.plos.org/plosone/ar...l.pone.0231349. This is fine if you have external evidence that within- and between- effects are one and the same thing and any observed differences are noise. But do you have any such evidence? If not, you are estimating a somewhat complicated set of estimands and even assuming you manage to figure out for yourself what it all means, I think it will defy explanation to others.

Now, one thing you can do is to create new variables from the ones shown in your diagram. One set of the new variables comes by de-meaning them all around the panel mean (these variables will represent the within-panel differences), and the other set of variables consists of the panel means of the variables themselves (representing between-panel differences). And since you seem to want to deal with time fixed effects you also have to make a set of variables that is demeaned within years and another set that consists of the means across years. You can then make a model out of these variables and run -gsem- (without the i.country and i.year effects), but you will find it is very unwieldy unless you can specify limited roles of the within- and between effects along the many different possible paths. (And neither here nor in #4 did I stray into the further complications that arise if we consider the possibility of interactions or lagged effects.)

If I simply deduct its own panel mean from each variable (*) and then put that into the mediate command in Stata, is that an appropriate application?

(*) Which is nothing else than "manually" doing what xtreg with panel FE would do "behind the scenes". I have to set up my regression dataset this way and then run reg rather than xtreg, anyway, because as I've said some of my data is industry level and some is firm level and the firm data is unbalanced. Hence, xtreg computes different means for the same variable for different firms which are not observed during the same time periods.
Tags: None

Announcement

Mediation analysis in panel dataset