Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gsem: Multiple mediation model with categorical mediators

    Dear Statalist members

    I would like to specify a multiple mediation model using "gsem" with one contintuous outcome variable (y: income), two binary mediators (m1 and m2: fertilizers and pesticides) and one continuous treatment variable (x: certification). The aim is to analyze the relative importance of both mediators by comparing their standardized indirect effect.

    Since both mediators are highly correlated, I would like to allow the two mediators to covary in the mediation model. However, it does not seem possible to covary two binary mediators in "gsem":

    Code:
    *************************** Generate example dataset *****************************
    cap log close
    clear
    clear matrix
    set more off
    
    set obs 5000
    
    *generate certification variable
    gen x = 5 * uniform()
    
    *generate idiosyncratic error term
    gen e1 = invnorm(uniform())
    sum e1
    replace e1 = (e1 - r(mean)) / r(sd) //standard normal distribution
    gen e2 = invnorm(uniform())
    sum e2
    replace e2 = (e2 - r(mean)) / r(sd) //standard normal distribution
    gen e3 = invnorm(uniform())
    sum e3
    replace e3 = e3 - r(mean) //normal distribution
    
    *generate mediator 1 (fertilizers)
    gen m1 = -2 + 2*x + e1
    replace m1 = m1 > 0
    tab m1
    
    *generate mediator 2 (pesticides)
    gen m2 = -2 + 3*x + e2 
    replace m2 = m2 > 0
    tab m2
    
    *generate income variable
    gen y = 1*x + 2*m1 + 3*m2 + e3 
    
    *"true" average marginal effect 
    gen trx1 = normalden(-2 + 2*x) * 2
    sum trx1, d //0,20
    gen trx2 = normalden(-2 + 3*x) * 3
    sum trx2, d //0,19
    
    ***************************** Mediation analysis *********************************
    *estimation via "sem"
    sem (y <- x m1 m2) ///
        (m1 <- x) ///
        (m2 <- x), cov((e.m1)*(e.m2)) method(ml) covstructure(_Ex,diag) 
    
    *estimation via "gsem"
    gsem (y <- x m1 m2) ///
        (m1 <- x, family(bernoulli) link(probit)) ///
        (m2 <- x, family(bernoulli) link(probit)), cov((e.m1)*(e.m2)) method(ml)
    margins, dydx(*) nose
    The "sem" command allows correlation between the mediators, but is invalid since we need nonlinear models for the paths between the treatment and the binary mediators. The related "gsem" command gets the error: "invalid covariance specification; e.m1 does not identify an error for an uncensored gaussian response with the identity link". As suggested in this post, this is probably because the nonlinear models for the two binary mediators inherently have no error term (logit(p) = XB whereas for OLS: y = XB + e)).

    I have two questions:

    1) Does anyone have a suggestion on how to allow for correlation between binary mediators in "gsem" in Stata? I have tried different solutions. First, I tried to make the mediators covary by adding common latent variables, as suggested in this post in another context. The approach seems to work with a simple dataset like this, but in my own case (N>50,000) it took several days and did not converge.
    Code:
    ************************** Latent variable approach ******************************
    sem (y <- x m1 m2) ///
        (m1 <- x) ///
        (m2 <- x), cov((e.m1)*(e.m2)) method(ml) covstructure(_Ex,diag) 
    sem (y <- x m1 m2) ///
        (m1 <- x A@1) ///
        (m2 <- x A@1), method(ml) covstructure(_Ex,diag)
    Second, I was thinking to use a bayesmh estimation in Stata, but I can't figure out how to specify this for a fixed effects panel data estimator in my own case of N>50,000. Any suggestions for this?
    One last solution I tried was to use the "lavaan" package in R which is able to let the binary mediators covary, although the warnings indicate that the model is probably not identified. Anyone have an idea how "lavaan" allows the mediators to covariate when this is not possible in Stata?

    2) If you could estimate the multiple mediation model, the effects of the treatment variable on the mediators are on another scale (namely beta/sigma) than the effect of the mediators on the outcome variable (beta). Is it even possible to calculate an indirect effect when both effects are on a different scale? If so, how do you interpret this?

    Thanks in advance for any suggestions

    Kind regards
    Eva


  • #2
    If someone would be interested in the answer, I contacted Stata's technical support and they provided me with the following solution:

    Stata's -gsem- command does not allow the error terms for the generalized distributed endogenous variables to covary. You could, however, consider using the community-contributed command -cmp- (ssc install cmp and ssc install ghk2 in order to run cmp). Here is a quick illustrative example:

    Code:
    cscript
     
    set obs 5000
    set seed 16
    gen x = 5*uniform()
    gen e1 = invnorm(uniform())
    summarize e1
    replace e1 = (e1 - r(mean)) / r(sd)
    gen e2 = invnorm(uniform())
    summarize e2
    replace e2 = (e2 - r(mean))
    gen e3 = invnorm(uniform())
    summarize e3
    replace e3 = e3 - r(mean)
    gen m1 = -2 + 2*x + e1
    replace m1 = m1 > 0
    gen m2 = -2 + 3*x + e2
    replace m2 = m2 > 0
    gen y = 1*x + 2*m1 + 3*m2 + e3
    gen trx1 = normalden(-2 + 2*x) * 2
    sum trx1, d
    gen trx2 = normalden(-2 + 3*x) * 3
    sum trx2, d
     
    sem (y <- x m1 m2) ///
         (m1 <- x) ///
         (m2 <- x), cov((e.m1)*(e.m2)) method(ml) covstructure(_Ex,diag)
                    
    gsem (y <- x m1 m2) ///
         (m1 <- x, family(bernoulli) link(probit)) ///
         (m2 <- x, family(bernoulli) link(probit)),  ///
                    method(ml)
                    
    cmp setup          
     
    // matching -gsem- results with independent covariance structure
     
    cmp (y = x m1 m2) (m1 = x) (m2 = x) ///
                   , indicator($cmp_cont $cmp_probit $cmp_probit) ///
                   covariance(independent)
                  
    cmp (y = x m1 m2) (m1 = x) (m2 = x) ///
                   , indicator($cmp_cont $cmp_probit $cmp_probit) ///
                   covariance(unstructured)
    The first -cmp- command reproduces the results from running -gsem-, while the second -cmp- command changes the -covariance()- option to allow the covariances of the error terms to covary. You should read more details about other supported covariance structure to use in -covariance()- in the help file of the command -cmp-.

    Kind regards
    Eva

    Comment

    Working...
    X