Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mediation analysis with structural equation model - step-by-step versus gsem

    I got two questions on a structural equation model (SEM), with the purpose of conducting a mediation analysis, with Stata.

    I conduct the mediation analysis in two (I supposed) equivalent ways, but I do not get equivalent results.
    I think that this is mostly my own interpretational problem.

    Below, I first write down the two methods, then I write the two questions.


    Method 1: Series of regressions

    Assume MV = mediation analysis and DV = dependent variable. I conduct three separate regressions.
    • DV = B0 + B1*MV + B2*IV + Controls
    B1 is the direct association between MV and DV
    B2 is the direct association between IV and DV.
    • MV = K0 + K1*IV + Controls
    K1 is the direct association between IV and MV.
    • DV = P0 + P1*IV + Controls
    P1 is the indirect association between IV and DV.

    The notation on the parts from the mediation analysis can be expressed following this paper:
    Hayes, A. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press, New York, US.

    B1 = a is the direct association between MV and DV
    K1 = b is the direct association between IV and DV
    B2 = c' is the direct association between IV and DV
    B2+B1*K1 = c'+a*b is the total effect
    To my understanding, a*b is a measure of omitted variable bias.

    This is the script:

    direct effect of IV on DV, while controlling for MV, c'
    Code:
    xi: ivreg2 DV MV Controls (IV = instrument)
    gives -.001457

    direct association between IV and MV, b
    Code:
    xi: ivreg2 MV Controls (IV=instrument)
    gives -.0430059

    direct effect of MV on DV, while controlling for IV, a
    Code:
    xi: ivreg2 DV MV Controls (IV = instrument)
    gives -.0073635

    indirect effet of IV on DV through MV, a*b
    Code:
    di -.0430059*-.0073635
    gives .00031667

    total effect a*b+c'
    Code:
    di (-.0430059*-.0073635)-.001457
    gives -.00114033

    [IV is endogenous and I have instrumented it, so IV is actually IVhat from a first stage where IV = instrument + Controls but I do not think it is important]

    the total effect should be the same as the coefficient on IV from:
    Code:
    xi: ivreg2 DV Controls (IV = instrument)
    the coefficient on IV is -.0024259


    Method 2: Stata built-in command

    With Stata, I can conduct these analyses (and obtain the relevant standard errors) with one unique command: either SEM or GSEM
    https://www.stata.com/manuals13/sem.pdf#semexample42g

    I use gsem and write what follows:
    Code:
    xi: gsem (MV <- IV controls) ( Y <- MV IV controls)
    [xi because I use a couple of vectors of dummies in the control variables)]
    [IV is the expected value from the first stage in ivreg2]
    Then the coefficients I am interested in are the following ones (I use nlcom because I find it easier than to look at the table):
    1. Code:
      nlcom _b[MV:IV]
      is b
    2. Code:
      nlcom _b[DV:MV]
      is a
    3. Code:
      nlcom _b[DV:IV]
      is c'
    4. Code:
      nlcom _b[MV:IV]*_b[DV:MV]+_b[DV:IV]
      is b*a+c'
    This is the script:

    Code:
    xi: gsem (MV <- IV controls) (DV <- MV IV Controls)
    Code:
    gsem, coeflegend
    //this is just my preference: I want to look at the coefficients I am interested in, without looking at large tables: p410-411 https://www.stata.com/manuals13/sem.pdf

    direct effect of IV on DV, while controlling for MV, c'
    Code:
    nlcom _b[DV:IV]
    gives -.0006713 which is very different from c' from Method 1

    direct effect of IV on MV, b
    Code:
    nlcom _b[MV:IV]
    gives -.0426406 which is similar to b from method 1

    direct effect of MV on DV, while controlling for IV, a
    Code:
    nlcom _b[DV:MV]
    gives -.0073408 which is similar to a from Method 1

    indirect effet of IV on DV through MV, a*b
    Code:
    nlcom _b[DV:MV]*_b[MV:IV]
    gives .000313 which is similar to a*b from Method 1

    total effect of IV on DV, "c'+a*b", when the mediator is excluded
    Code:
    di -.0006713+.000313
    gives -.0003583 which is very similar to c'+a*b from Method 1


    Problem 1

    In Method 1 and 2, the total effect, c'+a*b, should be the same as the coefficient on IV from:
    Code:
    xi: ivreg2 DV Controls (IV = instrument)
    But it is not the same. This regression gives an estimated coefficient of IV equal to -.0024259, versus -.00114033 in Method 1 and -.0003583 in Method 2
    What am I doing wrong?


    Problem 2

    The estimated coefficients for a and b across the two methods are similar, but c' is quite different.
    What am I doing wrong?


  • #2
    I have found the solution to this issue.

    The independent variable is endogenous, and I have mentioned it in square brakets, thinking this would be totally irrelevant...it turned out it was not.
    What I had done is this: I have regressed the first stage and obtained the predicted values of this variable.
    Then, I have used the (predicted) IV so obtained in both methods, that is, series of regressions versus GSEM; however, the first stage to obtain the IV used a slightly larger sample.
    The reason for having a slightly different sample size is that the mediator variable (MV) has some missing values.
    In other words, the first stage was conducted on a larger sample size.

    My solution was first to run a rubbish OLS: Y IV MV Controls, and create a variable to flag the sample.
    Then, I have regressed the first stage to obtain the predicted values of IV on this flagged sample.
    Finally, I have used these predicted values in both the regressions and in GSEM--again, on the flagged sample.
    Now the results are identical across the two methods.

    The lesson learned is: make sure that the samples used across methods and different stages are the same, if this is relevant.
    It is a banal observation, but so banal that one might neglect it.

    Comment


    • #3
      Thank you for posting back with the solution you found. I'm sure this will be helpful to others in the future.

      May I point out that if, in #1, you had posted the complete output of the regressions you did, it is likely that somebody would have noticed the discrepancy in sample sizes and responded quickly. Instead, your post languished for two days with no responses, because the information provided, detailed though it was in many respects, was insufficient. In particular, the parts of the output that actually revealed the underlying problem were not there.

      Moral of the story: when asking for help troubleshooting code, always show the exact and complete code that you are having problems with along with the exact and complete output you got from Stata, including any messages. It is also usually a good idea to use -dataex- to post example data that reproduces the problem.

      Comment


      • #4
        I thought mine was a complete post, but obviously it was not, so, as you pointed out, this is a second important lesson learnt.

        Comment


        • #5
          Hi FLuca and Clyde Schechter . Thank you for the thread which is very relevant to the issue I'm facing with my research. I am completely new to econometrics and Stata. I also need to do a mediating analysis, and I chose to use the SEM (but perhaps will test again with your first method). I do have a question about how to interpret the Direct, Indirect and Total effects figures reported. I do understand the 4 steps in the Baron and Kenny (1986) method, but I have trouble applying that to the output from Stata using SEM. Would you be kind enough to set some lights on this for me? For example, is the MV having a mediating effect between X and Y when a+b (indirect effect) is not 0? Do we draw the mediating effect conclusion only from the Indirect effect figures or Total effect, or both? I apologize if the question seems elementary, but I could not find any answers after a while.

          Thank you so much in advance for your help!

          Comment


          • #6
            To give a clearer picture into what I'm trying to figure out, I've included the code that I have used and the output from Stata (I don't know how to include the output using dataex, I apologize for that).

            I used the following code for SEM (I generated this using the SEM builder)
            Code:
             sem (l_CEODUAL -> l_SUSCOMMITTEE, ) (l_CEODUAL -> ESGSCORE, ) (l_BLER -> l_SUSCOMMITTEE, ) (l_BLER -> ESGSCORE, ) (l_TIER -> l_SUSCOMMITTEE, ) (l_TIER -> ESGSCORE, ) (l_SUSCOMMITTEE -> ESGSCORE, ) (l_SIZE -> l_SUSCOMMITTEE, ) (l_SIZE -> ESGSCORE, ) (COUNTRY_e -> l_SUSCOMMITTEE, ) (COUNTRY_e -> ESGSCORE, ) (INDUSTRY_e -> l_SUSCOMMITTEE, ) (INDUSTRY_e -> ESGSCORE, ) (l_BIND -> l_SUSCOMMITTEE, ) (l_BIND -> ESGSCORE, ) (l_BGENDIV -> l_SUSCOMMITTEE, ) (l_BGENDIV -> ESGSCORE, ) (BSIZE -> l_SUSCOMMITTEE, ) (BSIZE -> ESGSCORE, ) (l_ROA -> l_SUSCOMMITTEE, ) (l_ROA -> ESGSCORE, ), vce(robust) cov( l_CEODUAL*l_BLER l_CEODUAL*l_TIER l_BLER*l_TIER l_SIZE*l_CEODUAL l_SIZE*l_BLER l_SIZE*l_TIER l_SIZE*INDUSTRY_e l_SIZE*l_BIND COUNTRY_e*l_CEODUAL COUNTRY_e*l_BLER COUNTRY_e*l_TIER COUNTRY_e*l_SIZE COUNTRY_e*INDUSTRY_e COUNTRY_e*l_BIND INDUSTRY_e*l_CEODUAL INDUSTRY_e*l_BLER INDUSTRY_e*l_TIER l_BIND*l_CEODUAL l_BIND*l_BLER l_BIND*l_TIER l_BIND*INDUSTRY_e l_BGENDIV*l_CEODUAL l_BGENDIV*l_BLER l_BGENDIV*l_TIER l_BGENDIV*l_SIZE l_BGENDIV*COUNTRY_e l_BGENDIV*INDUSTRY_e l_BGENDIV*l_BIND l_BGENDIV*BSIZE BSIZE*l_CEODUAL BSIZE*l_BLER BSIZE*l_TIER BSIZE*l_SIZE BSIZE*COUNTRY_e BSIZE*INDUSTRY_e BSIZE*l_BIND l_ROA*l_CEODUAL l_ROA*l_BLER l_ROA*l_TIER l_ROA*l_SIZE l_ROA*COUNTRY_e l_ROA*INDUSTRY_e l_ROA*l_BIND l_ROA*l_BGENDIV l_ROA*BSIZE) nocapslatent
            Here is the output
            Click image for larger version

Name:	S5.PNG
Views:	1
Size:	37.9 KB
ID:	1712935



            I then used the command to generate effect tables:
            Code:
            estat teffects
            The output is as follow:
            Click image for larger version

Name:	S2.PNG
Views:	1
Size:	31.4 KB
ID:	1712922

            Click image for larger version

Name:	S3.PNG
Views:	1
Size:	25.3 KB
ID:	1712923

            Click image for larger version

Name:	S4.PNG
Views:	1
Size:	31.5 KB
ID:	1712924



            My MV is SUSCOMMITTEE and DV is ESGSCORE, IVs are CEODUAL, BLER, TIER and the rest is controls. Basically I'm trying to find out whether SUSCOMMITTE has a mediating effect between the IVs and ESGSCORE.

            Thank you so so much for you help!

            Best,

            Thao



            Last edited by ThanhThao Nguyen; 09 May 2023, 08:46.

            Comment


            • #7
              The key part of the output for your question is in the second half (under the heading ESGSCORE) Indirect Effects table, where the coefficients for the paths from your IVs through your MV to your DV are shown.

              As for what to make of them, there are various approaches people use. The notion that there either is or isn't a mediation effect is really rather nonsensical. In the world of human and organizational behavior everything is related to everything else to some extent, actual zero effects are pretty much non-existent. So relying just on whether an indirect effect estimate is statistically significant is testing a straw-man hypothesis, and in any case the statistical significance test is contaminated by the effects of sample size and is not really a measure of strength of effect.

              The real question is whether the indirect path effect is large enough to matter from a practical point of view. This requires applying your knowledge of the domain you are working in and also requires subjective judgment. Different analysts also disagree on whether what matters most is the indirect effect itself or the ratio of the indirect effect to the total effect. Your investigation seems to be in the area of finance or econometrics, and I have no expertise in this area, so I cannot advise you more specifically than that. You really should seek advice from somebody in your field about these issues. (There are several such people on this Forum and perhaps one of them will chime in.)

              Comment

              Working...
              X