Is the optimizer in sem constrained to produce admissible solutions?

Mikko Rönkkö

Join Date: Apr 2015

Posts: 28
#1

Is the optimizer in sem constrained to produce admissible solutions?

13 Apr 2015, 13:08

Hi

I am teaching an introductory class on statistical research methods. I have an example written in R, which I would like to show with Stata as well. I am fitting a CFA model to covariance matrix and the example should produce a Heywood case. This works well with the Lavaan package for R. However, when I do this same example in Stata, the model fails to converge and all variance estimates in the non-convergent models are positive. The non-convergence is related to including a factor that should produce the heywood case.

Is the optimizer in sem constrained to produce only admissible estimates? If so, is there a way to get it to produce Heywood cases?

Running Stata 13 on Mac.

I am attaching the R code, the estimates produced by Lavaan in R, and the Stata do-file for the example.

Mikko
Attached Files

stata.do (2.0 KB, 1 view)

lavaan.R.txt (4.6 KB, 1 view)

lavaan_output.txt (5.3 KB, 1 view)
Tags: None

1 like
wbuchanan

Join Date: Mar 2014

Posts: 1362
#2

13 Apr 2015, 15:02

R is expecting a covariance matrix that is subsequently rescaled:

Numeric matrix. A sample variance-covariance matrix. The rownames and/or colnames must contain the observed variable names. For a multiple group anal- ysis, a list with a variance-covariance matrix for each group. Note that if max- imum likelihood estimation is used and likelihood="normal", the user pro- vided covariance matrix is internally rescaled by multiplying it with a factor (N-1)/N, to ensure that the covariance matrix has been divided by N. This can be turned off by setting the sample.cov.rescale argument to FALSE.

In Stata, you are defining the same matrix as the correlation matrix. Perhaps that is where you are seeing the difference in behavior?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

13 Apr 2015, 15:10

In the Stata SEM manual, section [SEM] intro 4, the subsection headed "What happens when models are unidentified" suggest that the behavior you are seeing in Stata, specifically the continuing "(not concave)" annotation on each iteration and the failure to converge, is expected behavior when the model is unidentified.

These comments are based on Stata 13.1 and its associated documentation.
Comment
Mikko Rönkkö

Join Date: Apr 2015

Posts: 28
#4

13 Apr 2015, 23:38

Thank you for your quick responses.

Originally posted by wbuchanan View Post

R is expecting a covariance matrix that is subsequently rescaled:

Numeric matrix. A sample variance-covariance matrix. The rownames and/or colnames must contain the observed variable names. For a multiple group anal- ysis, a list with a variance-covariance matrix for each group. Note that if max- imum likelihood estimation is used and likelihood="normal", the user pro- vided covariance matrix is internally rescaled by multiplying it with a factor (N-1)/N, to ensure that the covariance matrix has been divided by N. This can be turned off by setting the sample.cov.rescale argument to FALSE.

In Stata, you are defining the same matrix as the correlation matrix. Perhaps that is where you are seeing the difference in behavior?

Removing the scaling in R by using sample.cov.rescale = FALSE in R did not make a difference. I will test the using ssd set correlations soon, but it seems unlikely that this is the problem.

I succesfully fitted the null model with both software and the results were nearly identical.

Originally posted by William Lisowski View Post

In the Stata SEM manual, section [SEM] intro 4, the subsection headed "What happens when models are unidentified" suggest that the behavior you are seeing in Stata, specifically the continuing "(not concave)" annotation on each iteration and the failure to converge, is expected behavior when the model is unidentified.

These comments are based on Stata 13.1 and its associated documentation.

This could also happen if the optimizer was constrained. With this correlation matrix, the best fitting solution has three negative error variances. If we constrain the optimizer, that could create a plateau (not concave) region in the likelihood function.

Also, I have checked model identification and the model is also not empirically under identified. I can estimate the same model with Lavaan without problems.
Comment
Mikko Rönkkö

Join Date: Apr 2015

Posts: 28
#5

03 Mar 2016, 12:00

Hi,

I am returning to this question with a new example in the hopes of getting and answer:

Code:

clear input x1 x2 x3 1 4 5 2 2 4 3 1 4 4 5 9 5 3 9 end corr factor x1 x2 x3, ml sem (F -> x1 x2 x3)

The example generates a small dataset which produces a heywood case in ML factor analysis. The uniquenesses seem to be constrained to be non-negative.

The sem analysis does not converge because it seems that the optimizer is constrained to produce only positive variance estimates. However, in this case the best fitting model has a large negative error variance for x3. Is there any way to get Stata to converge to a negative variance estimate?

Mikko
Comment
Mikko Rönkkö

Join Date: Apr 2015

Posts: 28
#6

08 Apr 2018, 04:06

I was asked about this issue privately, so I thought that it can be interesting for others as well to know what is the likely cause of this behavior. I believe that the variances are constrained to be positive, and this is because the estimation metric is not the variance itself, but log of the standard deviation. The documentation for sem does not explain the estimation metric, but this is how Stata's mixed command specifies variances. So instead of adjusting the variance directly, the optimizer adjusts variable X, which can take on any positive or negative number. The variance that is used in likelihood calculations is defined as exp(X)^2, which is always positive. If the maximum likelihood estimates of the variance was negative, X would become indefinitely large negative number leading to non-convergence because it would always be possible to get a larger likelihood by making X more negative.

This makes a lot of sense for mixed models because you can integrate over the random effects only if the random effect variances are non-negative. This would also be the case with gsem, which uses integration. I think that the sem command is implemented using the same way to parameterize variances. This would be make the commands more consistent with one another even if it was non-ideal when fitting models to covariance matrices.

A final caveat: I do not know if this is the correct explanation, but it would make sense. If true, this design decision means that there is no way to get negative variance estimate.

Mikko
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

08 Apr 2018, 08:00

Mikko,

I honestly don't understand what you wrote fully. However, while investigating how other programs used in latent class analysis handle convergence, I've come to believe that Stata's likelihood maximization criteria may also differ from other software, particularly MPlus and the Penn State latent class plugin. All packages appear to check for change in LL and the first derivative of the likelihood function. Stata also checks the second derivative of the likelihood function, and I am not sure if other packages do the same. This may have some bearing on what you wrote - assuming I am correct.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

Is the optimizer in sem constrained to produce admissible solutions?

Comment

Comment

Comment

Comment

Comment

Comment