Mediation Analysis with one binary IV, several MV and one DV

Robin von Weitersburg

Join Date: Mar 2018

Posts: 6
#1

Mediation Analysis with one binary IV, several MV and one DV

29 Mar 2018, 14:30

Dear fellow Stata users,

my model consists out of one IV, several MVs and one DV and is overall pretty basic (see image below for illustration). I also want to add CVs later.

To get a better understanding of my model I provide you with a short description of the variables. Please excuse me if my explanations are not very accurate. I am not very good in statistics but I promise to try my best.

IV (or "FF" in the image): This is a binary variable coded with 0 or 1.

MVs (e.g. "WorkLifeBalance" in the image): The data of the MVs is based on employer ratings from an employer rating website similar to Glassdoor.com. On this website employers can rate several factors like work-life balance, salary, image, career development etc. based on a 5 point Likert-Scale. Imagine you have watched a movie and afterwards you have to rate the movie using the factors "action" and "fun" with a 5 point Likert-Scale. The data for my MVs is basically be collection of the ratings for "action" and "fun" for many movies made by even more movie watchers. I think in statistics this would be called a categorical variable but please correct me if I'm wrong.

DV ("Empfehlung" in the image): This data is based on recommendation values for several companies. After rating their employer the employees can enter if they would recommend their employer to other people. For example, if 5 people recommend company A and another 5 people don't recommend the same company, company A has recommendation rate of 50% (5 yes, 5 no).

To give you further illustration I have added an excerpt of my data for one company:

The IV, DV and MVs are in yellow. First we have the binary IV (1) then the DV (0.65) and then follow the MVs.

Thats were my problem or let's say my challenges with Stata starts.
I have already watched several clips on youtube, have googled around, and have checked the FAQ and several help files but still I'm not sure how to perform my mediation analyses correctly.

First I tried to build my model with the SEM builder but when I want to check for the indirect effect via "estat teffects" the results show no paths (see screenshot below).
How is this possible and in consideration of my data is it even the right way to use the SEM builder?

Then I found the following instruction about how to do mediation analysis with the sem command (https://stats.idre.ucla.edu/stata/fa...e-sem-command/).
Unfortunately even when I use the command for a simple mediation model with my data ["sem (MV <- IV)(DV <- MV IV)"] stata returns:
"model not identified; no paths from latent variable Empfehlung to observed variables r (503);"

So far I have understood that I am not using latent variables in my model, so I guess this is the problem with sem but please correct me if I'm wrong.

Then I have found the instruction about how to analyze multiple mediators in stata using the sureg command (https://stats.idre.ucla.edu/stata/fa...tors-in-stata/).
Eventually I am able to follow this instruction and after the bootstrapping process I have also obtained some okay looking values but I have not clue if this is the right approach to do a mediation analysis with my data.

Recently I have also tried out the the instruction about how to perform a sobel goodman mediation test (https://stats.idre.ucla.edu/stata/fa...ests-in-stata/). I was also able to follow this instruction but again I'm not sure if the sobel goodman approach is the right one for me. Using the sobel goodman test also provides me with very weird results.

I would very much appreciate if you could tell me what I'm doing wrong. If the sem command or sem builder is not the correct way which way is it? Is it a problem that my IV is binary but not the rest of the variables? Is there something else I might have overlooked?

If you need any further information to assist me, just let me know and please excuse me if I am a bit of a dummy regarding statistics.

Overall I have the feeling that my model is not very complex. Therefore, I'm sure that any recommendations from you guys are also helpful for other Stata beginners who visit the forum.

Thank you very much and best regards,

Robin

Edit: I've tried to resize the images and to make them appear smaller but somehow this hasn't worked so far.
Attached Files

Last edited by Robin von Weitersburg; 29 Mar 2018, 14:34.
Tags: None
Roman Mostazir

Join Date: Apr 2014

Posts: 876
#2

29 Mar 2018, 17:54

Read the SEM help file first before investing time on modeling. Type help sem. It's worth investing the time. You are reading the output wrongly. The indirect effect image you provided correctly showing the total indirect effect from FF to Emp through the three mediators. The `zero's showing 'no path' meaning FF does not have indirect path to the mediators which is correct because FF only has direct paths to them. Indirect effect is the product of two or several direct effects. Read the other outputs produced by 'teffect' where direct path coefficients are reported (you omitted that) and calculate the indirect path as a summation of product terms among the mediator paths which is reported in the image you provided as the indirect path from FF-->Emp.

On a side note, you have not assumed a correlation among your endogenous and outcome variables. The model should look like below (but not the final one), assuming mediators are continuous, as they are not categorical in sense:

/*variables names are truncated for ease and lower case used. SEM terminology preserves uppercase for latent variables.*/

Code:

sem (arb <- ff) (work <-ff) (image <- ff) (empfh<- arb work image ff), /// cov(e.arb*e.work e.arb*e.image e.arb*empfh e.work*e.image e.work*empfh e.image*e.empfh)

Please also read the FAQ section to learn how to post, rules for providing data example using -dataex- and providing exact commands used and outputs. That way you increase your chances of good reply.

Roman
Comment
Robin von Weitersburg

Join Date: Mar 2018

Posts: 6
#3

30 Mar 2018, 09:05

Dear Roman,

thanks a lot for your help! I've already reviewed some parts of help sem and have started to understand what I did wrong in the beginning. I have also a better understanding of mediation analysis in general now but of course there is still a lot to learn.

Regarding your side note. What is the difference between my endogenous and outcome variables? Do you mean the dependent variable (empfh)? Or are the endogenous variables the mediation variables and the outcome variable the dependent variable or vice versa? I have looked up both terms on Google but still I do not really understand the difference.

I would be really happy if you could also explain me why it is important to assume a correlation among my endogenous and outcome variables? Regarding your code I have seen something similar in the help files but as far as I have understood there is no explanation why assuming a correlation could be important.
I'm very sorry that my statistics knowledge is not the best.

Originally posted by Roman Mostazir View Post

On a side note, you have not assumed a correlation among your endogenous and outcome variables. The model should look like below (but not the final one), assuming mediators are continuous, as they are not categorical in sense:

/*variables names are truncated for ease and lower case used. SEM terminology preserves uppercase for latent variables.*/

Code:

sem (arb <- ff) (work <-ff) (image <- ff) (empfh<- arb work image ff), /// cov(e.arb*e.work e.arb*e.image e.arb*empfh e.work*e.image e.work*empfh e.image*e.empfh)
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 876
#4

30 Mar 2018, 09:29

In econometric parlance an endogenous variable is the variable that is affected by an independent variable and also has an effect on the main outcome variable. In your case, the three mediators all are endogenous variables. The ff is an exogenous variable and empf is the main outcome variable. Because the mediators/endogenous variables are also outcome variables, each regression equation of endogenous variable on exogenous variable will lead some error/residuals unexplained by the regression equations. And residuals of one endogenous variable might be correlated with the residuals of other outcome variables because we are never sure of where the errors are coming from; they are unmeasured confounders. Therefore, assuming a correlation among them ensures that we have adjusted for possible confounders that might have biased the estimates. Ignoring these correlation may lead wrong parameter estimations.

You can always look for the co-variance parameter outputs produced by Stata for each of your covariance terms in the model. If there are not enough co-variances for any of the combinations, you can certainly remove that from your model and re-estimate the model.

Roman
Comment

Announcement

Mediation Analysis with one binary IV, several MV and one DV

Comment

Comment

Comment