SEM builder- estimation failed error

Soumya Upadhyay

Join Date: Oct 2016

Posts: 43
#1

SEM builder- estimation failed error

03 Feb 2017, 16:52

When I try to estimate my model in Stata's SEM builder, I get the error "estimation failed". Would that be because I have missing values? I also chose the model in which I could check "maximum likelihood with missing values". But it still fails to estimate. Please advise. Thank you!
Tags: None
Roman Mostazir

Join Date: Apr 2014

Posts: 877
#2

03 Feb 2017, 17:23

Show us the command you have used and the output you received so that others can understand the problem domain and can help you. This is also suggested in the FAQ section, please read it if you have not.

In most cases missing values are unlikely to cause convergence problem. Often changing the technique helps such as instead of default Newton-Raphson method try Berndt-Hall-Hall-Hausman method . You can define them in the technique option i.e. technique(bhh). The help file for SEM has a section for dealing with convergence. That should help to address.

Roman
Comment

Soumya Upadhyay

Join Date: Oct 2016
Posts: 43

04 Feb 2017, 19:25

Following is my dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long MCRNUM float(RN_staffing hsptl_staffing) double(pos_teamac pos_handoff pos_overall)
30014  8.549974         .  .632498947829478  .4395222128289342 .6236115758388308
30038  7.003634         . .6065812421593322 .45978321729249944 .6049491002900073
30038  7.003634         . .6065812421593322 .45978321729249944 .6049491002900073
30087  7.188053         . .6615754459352088  .5236581646443227 .6614742692441345
30087  7.188053         . .6615754459352088  .5236581646443227 .6614742692441345
30092 11.925195         . .6112852431427054  .4523331634859975 .6025510664834064
30123 10.389868         . .7224183722637787  .5694558538702998 .7163663401606037
30123 10.389868         . .7224183722637787  .5694558538702998 .7163663401606037
43300   13.2215 .05022413 .5635656327228361 .33254446579265423  .570111087839279
50026  10.04374  .2267941 .6174986288789501 .48950446108345086 .6929756214657257
end

I am using the SEM builder. Then, the measurement component window. My first latent variable is structure (RN_staffing, hsptl_staffing), second is process (pos_teamac and pos_handoff) , third is outcome (pos_overall). I have an arrow from structure to process and one from process to outcome. I also have a direct arrow from structure to outcome. The idea is that process plays a mediating role between structure and outcome. When I estimate my model, I get unstoppable iterations, after which I have to click on break, and after that I get the error 'estimation failed'. So, please help!

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

04 Feb 2017, 20:11

Unstoppable iterations meaning that the program keeps iterating with no perceptible change in the log likelihood?

i suspect your model is not identified. You have 4 indicator variables, which you are loading onto 2 latent constructs. You have too few pieces of information; it's like solving 2 simultaneous equations for 3 unknowns. Moreover, you are treating your outcome variable as latent as well, and you only have one exogenous variable for that.

also, it's worth thinking why you need latent variables at all in this case. You might as well treat the staffing factors as known. Adequacy of hospital staffing is such a complex issue that I very much doubt it can be adequately represented by one measure of nurse staffing (nurses per patient day?) and one other. I have no idea what your process measures are, but they look continuous, and it sounds like they represent some sort of measure of how well handoffs were done, and something else. Those were probably measured from some sort of questionnaire with Likert items. If you seriously want some sort of latent variable to represent, say, the quality of discharge planning, then you are better off feeding in the individual Likert items (and also, you want to ask if this construct has been theoretically validated and if the test shows adequate internal reliability, what its factor structure is, other psychometric properties) into the SEM.

Lastly, usually, structure should have a pretty small impact on outcomes (unless there's some sort of gross mismatch, e.g. Very high risk women with ectopic pregnancy plus multiple comorbidities getting treated at a small birthing center). Usually, I would think of structure being the mediator, not process. Process is much more proximal to the outcome.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

05 Feb 2017, 10:38

Originally posted by Weiwen Ng View Post

...

Lastly, usually, structure should have a pretty small impact on outcomes (unless there's some sort of gross mismatch, e.g. Very high risk women with ectopic pregnancy plus multiple comorbidities getting treated at a small birthing center). Usually, I would think of structure being the mediator, not process. Process is much more proximal to the outcome.

I may have got ahead of myself here, in assuming something about your causal model. Let me clarify myself a bit. The original post refers to structure, process, and outcome. These are the 3 big categories of healthcare quality measures in Donabedian's famous conceptual framework. Structure means large-scale things like how well a hospital is staffed (e.g. nurse hours or MD hours per patient-day or resident-day, perhaps by clinical specialty area like maternal-fetal medicine specialists or critical care or hospitalist or whatever). Process means measures of what the clinicians do to the patient (e.g. a binary variable for medication reconciliation at discharge, a binary variable for whether or not a hospital patient received a primary care follow-up within 30d of discharge, a Likert scale of how well the patient felt her discharge was planned). Outcomes are self explanatory (e.g. did the patient die or get readmitted to hospital, health-related quality of life).

We usually find that structure has a small impact on ultimate outcomes. One might be surprised at that, but it is what it is. For example, go Google the literature on hospital volume; the theory is that high-volume hospitals and their staff might have better practice at a certain procedure, and/or that hospitals good at a procedure might get more referrals as word gets out. You will, in fact, find consistent support for volume having a positive impact on outcomes. You will also find that the impacts are small. Actually, you will also tend to find that process measures also have pretty small impacts on outcomes. A lot depends on the patient's initial health state.

Those are general guides that may or may not apply to Soumya's question. But, back to the actual question: the model is all but certain to be under-identified as specified. I am pretty sure that the outcome can't possible be treated as a latent variable with only one exogenous indicator feeding into it. When you say you are treating structure as a latent variable here, you have only 2 indicators of structure, and those are 2 staffing measures. Same for process. I'm not sure that theoretically makes sense, and it be just as good to treat the whole thing in a regular regression format. You'll be showing the impact of nurse and other staffing on the outcome, as mediated by your measures of handoff and team somethingorother.

The UCLA site has a good primer on how to do mediation analysis in SEM.

http://www.ats.ucla.edu/stat/stata/f..._mediation.htm

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Soumya Upadhyay

Join Date: Oct 2016

Posts: 43
#6

06 Feb 2017, 10:49

Thank you for your response and insights.Yes, I am using Donabedian's SPO model as my theoretical underpinning. Yes, you are right that structure is not latent because it is RN staffing and hospitalist staffing. Process is teamwork across units and handoffs. Outcome is safety culture perceptions. If you think SEM doesn't conceptually fit here, should I just use Baron and Kenny method and simply use regressions?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

06 Feb 2017, 13:24

I'm not familiar with fitting mediation models. SEM example 42g gives a clear example of fitting them with only manifest variables. It should be possible to fit the model that way.

http://www.stata.com/manuals13/semexample42g.pdf

If your theory really says that you have to treat structure, process, and outcome as latent variables, there appear to be things you can do to make the model identifiable.

http://davidakenny.net/cm/identify_formal.htm

I am not as familiar with identification as I'd like to be. But, it does appear that if you want to treat your outcome variable as latent, then you have to constrain the error variance of the indicator (e.g. you have theoretical work on the subject already and you know the error variance is some fixed value), or find an instrument.

There are also things you can do to identify the structure and process constructs. That said, the most likely solution is to set the loadings of both indicators on each construct to be equal. That doesn't seem justified, unless you have literature demonstrating this.

You are also estimating the loading of process on structure as part of the mediation analysis. I have no clue what theoretical problems you may have in terms of identification, but I have a feeling you will face at least some.

I really don't see the point of treating any of the staffing variables as latent. Actually, for the other variables you mentioned, those can (and maybe should) be treated as latent if you have all the indicators that were measured. But, in the data example, it looks like you only have summary scores. So, I think you have no choice but to treat them as manifest. But, someone please correct me if I'm wrong.

Last, in your data example, hospital staffing has a number of missing values. Do you know what percent of observations have missing values? You can invoke the maximum likelihood with missing values option, which will assume that the missingness is conditional on (I believe) all the observed covariates in the model. But if you have a very high percent missing (say over 30%), that is a problem in and of itself. If it's less than 5% missing, there's probably not going to be a huge difference between casewise deletion* and MLMV, although I'd run MLMV anyway to check. If it's between 5 and 30% missing, that's probably the sweet spot if the data can be assumed to be missing at random or close to that (conditional on observed variables). If that assumption is badly wrong, then your inference is badly wrong.

* The default setting for the linear SEM command is to discard all observations that have one or more missing values. The MLMV option uses all the available data (assuming, I think, that there is at least one measured variable in there). It handles the missing data through the likelihood function, and it does require that all observed and latent variables be distributed close enough to multivariate normal. I am pretty sure that MLMV is also known as full information maximum likelihood, and if you want to see some of the math behind this, I am linking a page on the SAS website for the equivalent command, but I confess to not understand any of the math.

http://support.sas.com/documentation...details156.htm

Last edited by Weiwen Ng; 06 Feb 2017, 13:42.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement