Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convergence not achieved. How to achieve that?

    Hi!
    I am trying to run the following regression

    logit bigr_misstatement nas_office nas_client big_4 expert log_clients aud_comp importance assets log_age cratio roa leverage zscore num_segments sales_growth_decile ma disc_ops foreign icw busy auditor_change i.msa i.sic i.fiscal_year, cluster(company_fkey)

    but before ending up with the full result it shows an error:

    Note: 3 failures and 2 successes completely determined.
    convergence not achieved
    r(430);


    A portion of the dataset is attached below. FYI: The SIC CODE FKEY here is converted to FF12 with appropriate coding before running the regression. Now, how can I achieve this convergence?

    Click image for larger version

Name:	Screenshot 2024-08-07 120516.png
Views:	1
Size:	91.7 KB
ID:	1760948

  • #2
    I think the problem may be that some of your clusters defined by the cluster(company_fkey) option may be always 0 or 1 on the outcome. There could also be some other kind of perfect collinearity among the predictors unrelated to the company_fkey clusters. Please see this FAQ entry for details.

    By the way, data examples that are images aren't very useful because I can't take your data example and try to reproduce your issue on my end. Please provide data examples using the -dataex- command instead.
    Last edited by Daniel Schaefer; 07 Aug 2024, 11:50.

    Comment


    • #3
      The two messages you received, one about some outcomes completely determined, and the other about non-convergence, have nothing to do with each other.

      The 3 failures and 2 successes completely determined are probably, as Daniel Schaefer points out in #2, some small clusters in which every outcome is 0, or every outcome is 1, respectively. These are not really problems. It just tells you that those clusters are uninformative about the relationships between the predictor variables and the outcomes, and so get omitted from the analysis. Assuming that the data set is large enough that the loss of 5 observations is harmless, you can just ignore this particular message unless you think that the non-varying outcomes in those clusters represent data errors that you must fix.

      The non-convergence is a different matter altogether. On the one hand, it is a common problem in models with a large number of predictor variables, and on the other hand it is not commonly seen with -logit-. Solving convergence problems is difficult. The first step is to try to identify specific variables that are causing difficulty. To do that, go back and look at the iteration log that Stata showed you. Typically for some time before Stata quit, there will have been a series of iterations in which the log likelihood was either not changing at all, or going around in circles. Identify the number of the iteration just before that happened. Then re-run the analysis, adding the -iterate(#)- option, with # replaced by that iteration number. This will cause Stata to stop just before the trouble arises, and you will see interim results. These results are incorrect and cannot be used as valid estimates of the logistic model. But they may contain clues to the sources of non-convergence. Review that regression output table looking for standard errors that are absurdly large, or for coefficients that are of absurdly large magnitude. If you find any, it is likely that these variables are the ones that are causing the model to fail to converge.

      If you find such variables, first verify that they have correct data. If you find data errors, fix them and try again. If the data on those variables appears to be correct, then omit those variables from the model and it will probably converge.

      Life is harder if there are no such variables found. In that case you will need to start over with a simple model. Start with the simplest model: including just your single most important predictor variable and run your model. Assuming that is successful, add in one more variable, and re-run. Keep adding variables one at a time for as long as you continue to get convergence. At some point you will hit one or more variables that cause the convergence to fail. Those problematic variables should be reviewed for data errors, and if there are none, then they need to be omitted from the model.

      I should add that sometimes convergence problems arise because the scales of the predictor variables are widely different. So take a look at the distributions of all the predictors. If you find some with very narrow ranges and others with very wide ranges, consider re-scaling some of those variables so that the variables are more similar in scale. Sometimes this change will cause a non-converging model to converge.

      Comment

      Working...
      X