Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Forcing Stata to Post a Singular Variance-Covariance Matrix ("variance matrix is nonsymmetric or highly singular")

    Hello all,


    I am trying to get around the "variance matrix is nonsymmetric or highly singular" error and force Stata to give me standard errors. Is this possible in Stata? I have looked at other threads, but haven't found a way to force standard errors to be given in them.

    Context:
    In the project I am working on with my colleague, he is running a regression with a CBSA fixed effect; this means that there are a huge number of regressors and that most of them are sparse indicator variables. When non-clustered standard errors are requested, there is no problem. When clustered standard errors are requested, we get the "variance matrix is nonsymmetric or highly singular" error. This is expected – when the number of regressors exceeds the number of clusters (which is the case with our project), the var-cov matrix is rank-deficient, and valid statistical inference on a limited number of coefficients (but not jointly on all of them at once) can still be conducted. So, we would like to get around the "variance matrix is nonsymmetric or highly singular" error and have Stata give us the var-cov matrix and standard errors anyway, despite the singular nature of the var-cov matrix.

  • #2
    I don't really understand the problem, but it seems like what you want isn't mathematically on the menu. Beyond this, what's even the question you all wanna answer? Like we need the research design and stuff to better comment, as well as data and code. Could you go a bit more into the detail of the design and what the overall point is the paper is please?

    Comment


    • #3
      I don’t have the code for the specific regression that my colleague is running unfortunately, though I can give a general overview.

      We are instrumenting a cloglog model using the control function approach. This mean we are regressing an endogenous covariate on instruments and controls using -regress- and using the residual from that first-stage regression as a control in a second-stage cloglog regression using -cloglog-.

      (The reason for the cloglog model is that the Prentice and Gloeckler [1978] grouped-time proportional-hazards model can be expressed as a cloglog model, but this is besides the point.)

      Since this is a two-stage procedure, we correct the standard errors to account for the first-stage using -gmm-, which Enrique Pinzon pointed out could be done in a Stata presentation. (Link: https://www.stata.com/meeting/uk20/s...K20_Pinzon.pdf) We feed in the appropriate moment conditions for -regress- and -cloglog- and use the standard errors given to us by -gmm-. The point of using the -gmm- command here is just as a hacky way to quickly and conveniently implement the sandwich formula; we actually make the number of iterations 0 and make gmm start at the point estimates we already have from -regress- and -cloglog-.

      We have around 1M observations, 500 clusters, and 600 regressors (most of which are CBSA fixed effects). We are clustering based on CBSA code, which we are also using as a fixed effect. As Cameron and Miller (2015) point out in “A Practitioner’s Guide to Cluster-Robust Inference”, this is going to result in a singular variance-covariance matrix for the coefficients that can still be used for valid inference as long as the number of restrictions does not exceed the rank of said variance-covariance matrix.

      This is our motivation for forcing Stata to give us the variance-covariance matrix for the coefficients, despite it being singular.

      Comment

      Working...
      X