I got two questions on a structural equation model (SEM), with the purpose of conducting a mediation analysis, with Stata.
I conduct the mediation analysis in two (I supposed) equivalent ways, but I do not get equivalent results.
I think that this is mostly my own interpretational problem.
Below, I first write down the two methods, then I write the two questions.
Method 1: Series of regressions
Assume MV = mediation analysis and DV = dependent variable. I conduct three separate regressions.
B2 is the direct association between IV and DV.
The notation on the parts from the mediation analysis can be expressed following this paper:
Hayes, A. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press, New York, US.
B1 = a is the direct association between MV and DV
K1 = b is the direct association between IV and DV
B2 = c' is the direct association between IV and DV
B2+B1*K1 = c'+a*b is the total effect
To my understanding, a*b is a measure of omitted variable bias.
This is the script:
direct effect of IV on DV, while controlling for MV, c'
gives -.001457
direct association between IV and MV, b
gives -.0430059
direct effect of MV on DV, while controlling for IV, a
gives -.0073635
indirect effet of IV on DV through MV, a*b
gives .00031667
total effect a*b+c'
gives -.00114033
[IV is endogenous and I have instrumented it, so IV is actually IVhat from a first stage where IV = instrument + Controls but I do not think it is important]
the total effect should be the same as the coefficient on IV from:
the coefficient on IV is -.0024259
Method 2: Stata built-in command
With Stata, I can conduct these analyses (and obtain the relevant standard errors) with one unique command: either SEM or GSEM
https://www.stata.com/manuals13/sem.pdf#semexample42g
I use gsem and write what follows:
[xi because I use a couple of vectors of dummies in the control variables)]
[IV is the expected value from the first stage in ivreg2]
Then the coefficients I am interested in are the following ones (I use nlcom because I find it easier than to look at the table):
//this is just my preference: I want to look at the coefficients I am interested in, without looking at large tables: p410-411 https://www.stata.com/manuals13/sem.pdf
direct effect of IV on DV, while controlling for MV, c'
gives -.0006713 which is very different from c' from Method 1
direct effect of IV on MV, b
gives -.0426406 which is similar to b from method 1
direct effect of MV on DV, while controlling for IV, a
gives -.0073408 which is similar to a from Method 1
indirect effet of IV on DV through MV, a*b
gives .000313 which is similar to a*b from Method 1
total effect of IV on DV, "c'+a*b", when the mediator is excluded
gives -.0003583 which is very similar to c'+a*b from Method 1
Problem 1
In Method 1 and 2, the total effect, c'+a*b, should be the same as the coefficient on IV from:
But it is not the same. This regression gives an estimated coefficient of IV equal to -.0024259, versus -.00114033 in Method 1 and -.0003583 in Method 2
What am I doing wrong?
Problem 2
The estimated coefficients for a and b across the two methods are similar, but c' is quite different.
What am I doing wrong?
I conduct the mediation analysis in two (I supposed) equivalent ways, but I do not get equivalent results.
I think that this is mostly my own interpretational problem.
Below, I first write down the two methods, then I write the two questions.
Method 1: Series of regressions
Assume MV = mediation analysis and DV = dependent variable. I conduct three separate regressions.
- DV = B0 + B1*MV + B2*IV + Controls
B2 is the direct association between IV and DV.
- MV = K0 + K1*IV + Controls
- DV = P0 + P1*IV + Controls
The notation on the parts from the mediation analysis can be expressed following this paper:
Hayes, A. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press, New York, US.
B1 = a is the direct association between MV and DV
K1 = b is the direct association between IV and DV
B2 = c' is the direct association between IV and DV
B2+B1*K1 = c'+a*b is the total effect
To my understanding, a*b is a measure of omitted variable bias.
This is the script:
direct effect of IV on DV, while controlling for MV, c'
Code:
xi: ivreg2 DV MV Controls (IV = instrument)
direct association between IV and MV, b
Code:
xi: ivreg2 MV Controls (IV=instrument)
direct effect of MV on DV, while controlling for IV, a
Code:
xi: ivreg2 DV MV Controls (IV = instrument)
indirect effet of IV on DV through MV, a*b
Code:
di -.0430059*-.0073635
total effect a*b+c'
Code:
di (-.0430059*-.0073635)-.001457
[IV is endogenous and I have instrumented it, so IV is actually IVhat from a first stage where IV = instrument + Controls but I do not think it is important]
the total effect should be the same as the coefficient on IV from:
Code:
xi: ivreg2 DV Controls (IV = instrument)
Method 2: Stata built-in command
With Stata, I can conduct these analyses (and obtain the relevant standard errors) with one unique command: either SEM or GSEM
https://www.stata.com/manuals13/sem.pdf#semexample42g
I use gsem and write what follows:
Code:
xi: gsem (MV <- IV controls) ( Y <- MV IV controls)
[IV is the expected value from the first stage in ivreg2]
Then the coefficients I am interested in are the following ones (I use nlcom because I find it easier than to look at the table):
-
Code:
nlcom _b[MV:IV]
-
Code:
nlcom _b[DV:MV]
-
Code:
nlcom _b[DV:IV]
-
Code:
nlcom _b[MV:IV]*_b[DV:MV]+_b[DV:IV]
Code:
xi: gsem (MV <- IV controls) (DV <- MV IV Controls)
Code:
gsem, coeflegend
direct effect of IV on DV, while controlling for MV, c'
Code:
nlcom _b[DV:IV]
direct effect of IV on MV, b
Code:
nlcom _b[MV:IV]
direct effect of MV on DV, while controlling for IV, a
Code:
nlcom _b[DV:MV]
indirect effet of IV on DV through MV, a*b
Code:
nlcom _b[DV:MV]*_b[MV:IV]
total effect of IV on DV, "c'+a*b", when the mediator is excluded
Code:
di -.0006713+.000313
Problem 1
In Method 1 and 2, the total effect, c'+a*b, should be the same as the coefficient on IV from:
Code:
xi: ivreg2 DV Controls (IV = instrument)
What am I doing wrong?
Problem 2
The estimated coefficients for a and b across the two methods are similar, but c' is quite different.
What am I doing wrong?
Comment