Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Addressing "Matrix Not Positive Definite" Error in -qreg2- with Clustered Standard Errors

    Hello!

    I am currently running a quantile median regression on wages. My data has 6 subgroups, each with ~7,000 observations. Control variables includes gender, year, education, tenure, a manager factor variable, and a manager-year interaction. There are ~450 controls in each subgroup model model, mostly due to small cell manager-year interactions.

    Originally, I ran an OLS model using log wages with standard errors clustered at the employee level. I ran this regression one at a time for each subgroup.
    Code:
    reg ln_wage gender i.manager##i.year ... tenure, cluster(employee_id)
    Because there are large male outliers, I also ran a median regression with clustered standard errors per Parente and Silva (2014). I ran this regression one at a time for each subgroup.
    Code:
    xi: qreg2 ln_wage gender i.manager##i.year ... tenure, cluster(employee_id)
    This ran normally, and gave results that I was expecting. I used log wages in order to directly compare the models. However, given that quantile regressions are non-parametric with regard to the relationship between X and Y, I decided to run the same regression using level wages as a robustness check. As before, I ran this regression one at a time for each of the 6 subgroups:
    Code:
    xi: qreg2 wage gender i.manager##i.year ... tenure, cluster(employee_id)
    When I ran this code, 4 of the subgroups ran normally, but 2 of the subgroups did not run and returned the error message "Matrix Not Positive Definite." In order to confirm that this issue did not lie with the level wages variable, I also ran these two models for all 6 groups, each of which ran without an error message:
    Code:
    reg wage gender i.manager##i.year ... tenure, cluster(employee_id)
    qreg wage gender i.manager##i.year ... tenure
    Is there anything specific to -qreg2- that would cause issues in the presence of large outliers, but not when they were condensed with a natural log? I am struggling to see how the log model and the level, non-cluster-robust models could run normally, but not the level, cluster robust model.

    Also, to clarify, despite having panel data, I am not running a fixed effect or first difference model, as we are interested in the overall effect of gender on wages, not the effect of gender on changes in wage.

    Any assistance or tips on diagnostic tools is greatly appreciated.

    Thanks!
    Andy Hammond

  • #2
    Dear Andy Hammond,

    First of all, let me clarify that it is not true that quantile regressions is non-parametric with regard to the relationship between X and Y.

    From what you describe, the problem is in the computation clustered standard errors. If you can share with me (feel free to contact me by email) a dataset where the problem occurs, I'll be happy to investigate the issue.

    Best wishes,

    Joao

    Comment


    • #3
      Hi Joao Santos Silva,

      Thank you for the prompt answer. Unfortunately I am under an NDA for the data, so I cannot share a sample. I could create a sample with artificial data, but I imagine that the same issues likely would not arise. I understand that this makes it harder for you to figure out what could be causing this, especially if this issue has not arisen for yourself or others in the past. However, happy to pass on any results or information that could be helpful outside of providing the raw data.

      On your point about the relationship between X and Y, it was my understanding that one of the benefits of a quantile regression was that you do not need to assume a linear (or in this case log-linear) relationship between the dependent and independent variable. Based on other analyses and literature review, we believe that there is a log-linear relationship between wage and control variables. Given this, would you think that the appropriate quantile regression needs to use log wages? If the level version of the -qreg2- regression is inappropriate, then perhaps the simplest solution to this error is to not run the level wages regression.

      Thanks,
      Andy

      Comment


      • #4
        Dear Andy Hammond,

        Indeed, if you think the relation is log-linear, then the regressions in level wages is not interesting. If you or anyone else finds a similar problem in a data set that can be shared, I'll be happy to investigate.

        Best wishes,

        Joao

        Comment

        Working...
        X