gsem

Johannes Muller

Join Date: May 2018

Posts: 45
#1

gsem

11 Feb 2019, 09:29

Dear all,

I am confronting some issues with multi-level gsem.

I have a two-level country-individual-level dataset with 100.000 individuals nested in 50 countries.
I am interested in 2 (dichotomous) DVs; DV1 & DV2, as predicted by the key independent variable of interest INDEP.
Moreover, I am interested in seeing whether the effect of my key independent variable of interest INDEP differs on DV1 as compared to its impact on DV2.

Because I cannot test this after the normal MLM syntax (meglm) -as far as I know-, I am trying to do this in the context of a gsem

My attempt goes as follows:

Code:

gsem (DV1 <- $controls INDEP L1[countries] ) /// (DV2 <- $controls INDEP L2[countries]) /// , latent(L1 L2 ) family(binomial) link(logit) nocapslatent intmethod(mcaghermite) intpoints(20) /// var(L1[NUTSenc]@v1 L2[NUTSenc]@v1) gsem, coeflegend test _b[DV1:INDEP= _b[DV2:INDEP]

Is this a suitable approach?

Moreover, while this model works well with some very basic controls, once I start adding more controls it quickly stops converging. This is particularly severe when I add fixed effects.
I however do need to add 100 fixed effects for my model to be well-specified.
If the model does not converge because of the number of fixed effects, what could I do to increase the chances of convergence?

Thank you so much in advance!

Best
Johannes
Tags: gsem, meglm, multi-level, SEM, suest
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

11 Feb 2019, 12:02

once I start adding more controls it quickly stops converging.

Can you be more specific? Is it that you get an infinite iteration log with some sort of error message (e.g. the log likelihood keeps saying not concave and the LL doesn't perceptibly change?), or is it just not going to the next iteration?

If it's the latter, I would ask if you really, really need 20 quadrature points. The default is 7. I have no real insight into why the default is 7, but I have to assume it's a generally useful default. The issue with quadrature points is that, per the Stata manual,

The more integration points, the more accurate the approximation to the log likelihood. However, computation time increases as a function of the number of quadrature points raised to a power equaling the dimension of the random-effects specification. In crossed random-effects models and in models with many levels or many random coefficients, this increase can be substantial.

You have two random effects, so I assume that means the computation time goes up as a squared function. With 20 points, I can only imagine that each iteration takes a very long time.

If the issue with the likelihood was that it seemed to be converging on an asymptote but the maximizer said the likelihood wasn't concave, that's harder to diagnose. It can mean lack of identification. You can note the iteration number where it seems to be hitting the asymptote, ask for that many iterations, e.g.

Code:

gsem ... , iterate(100)

and then inspect the coefficients and see if any have missing standard errors or absurd values. If you have logit coefficients going to a very large positive or negative number, that can cause non-convergence. It's hard to say what to do in this case, but at least you might get a sense of what's wrong.

As a side note, does your output table contain a covariance between the random effects? I forget what the default is, but if you're not estimating a covariance between the random effects, you're saying that the country-level random effects are completely independent. It does seem like it might be substantively justified to incorporate a covariance between them.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#3

11 Feb 2019, 19:39

Originally posted by Johannes Muller View Post

I have a two-level country-individual-level dataset with 100.000 individuals nested in 50 countries.

"Two-level" means that you have a single observation per individual. Maybe two dichotomous characteristics in each observation, but only a single two-dimensional observation per individual.

Is this a suitable approach?

If your description of your dataset is accurate, then no.You have two latent variables (random effects) for a single observation. The code snippet below shows how I would answer your research question with a two-level model (individual automobiles within repair record) and two binomial attributes measured on each individual.

I however do need to add 100 fixed effects for my model to be well-specified. . . .what could I do to increase the chances of convergence?

I recommend going back to theory and prior knowledge to see whether you can devise a more parsimonious statistical model. With the need to adjust for a hundred covariates, you were not going to be able to say much if anything at all about INDEP, anyway.

Code:

version 15.1 clear * set seed `=strreverse("1483123")' quietly sysuse auto summarize rep78, meanonly quietly replace rep78 = runiformint(`r(min)', `r(max)') if mi(rep78) summarize mpg, meanonly quietly replace mpg = mpg > r(mean) gsem /// (foreign <- c.price M[rep78]) /// (mpg <- c.price M[rep78]), /// family(binomial) link(logit) /// nocnsreport nodvheader nolog estimates store Unconstrained constraint define 1 _b[foreign:price] = _b[mpg:price] quietly gsem (foreign <- c.price M[rep78]) (mpg <- c.price M[rep78]), /// family(binomial) link(logit) constraints(1) lrtest Unconstrained exit
1 like
Comment

Announcement

Comment

Comment