Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nondeterministic output from -areg- with clustered standard errors

    I noticed that the output of a certain regression sometimes reported all standard errors as missing, and other times only reported one particular regressor's standard errors as missing. Although the original regression contained many regressors (a triple interaction between categorical factor variables) and hundreds of observations, I was able to reproduce the phenomenon with the following reduced example. (I also replaced the dependent variable, y, with uniformly random noise between 0 and 1.)

    Code:
    version 14.1
    
    clear
    
    input   a    b    c    fe   cl   y
            0    0    0    1    1    0.00780267
            1    1    1    1    1    0.14052081
            0    0    0    2    2    0.51635212
            1    1    1    2    2    0.86433381
            0    0    0    3    3    0.65547532
            0    0    1    3    3    0.30712718
            0    0    0    4    3    0.98459065
            0    1    0    4    3    0.55534583
            0    0    0    5    4    0.44104260
            0    1    0    5    4    0.85928327
            0    0    0    6    5    0.74272376
            0    0    0    7    6    0.67516220
            0    0    0    8    7    0.71862328
    end
    
    forvalues i = 1/10 {
         cap areg y  a b c, a(fe) vce(cl cl)
         di "b[a] = "_b[a] ", se[a] = " _se[a]
    }
    
    forvalues i = 1/10 {
         cap areg y  c b a, a(fe) vce(cl cl)
         di "b[a] = "_b[a] ", se[a] = " _se[a]
    }
    Changing the order of the regressors from "a b c" to "c b a" (along with various other modifications one can think up, like adding 1 to "y") causes the output to become deterministic again.

    Yes, the fixed effect is nearly collinear with the cluster variable. This is not the case with the full dataset, only with the reduced testcase.

    I am unsure whether the expected output has all-missing standard errors or some finite standard errors, and would appreciate feedback on that point. FWIW, -areg- (part of the time), -reg-, and -xtreg, dfadj- all report identical standard errors for regressors "a" and "b", and report the standard errors of regressor "c" as missing, or zero, or near-zero.

    I reported another case of nondeterministic outputs under different conditions here. It turned out to be due to an internal unstable sort used by -xtreg-. However, to my surprise, StataCorp decided that the nondeterministic output is a feature, not a bug. It is possible that they will have the same reaction to this phenomenon.

  • #2
    I did some more bisection, and was able to remove areg from the testcase.

    Code:
    version 14.1
    clear
    
    input   byte fe    byte cl    double y    double a    double b    double c
    1    1    +1.0429df8d3b13bX-001    -1.6276276276276X-002    -1.89d89d89d89d8X-003    -1.13b13b13b13b1X-002
    1    1    +1.481d81653b13bX-001    +1.4ec4ec4ec4ec5X-001    +1.9d89d89d89d8aX-001    +1.7627627627628X-001
    2    2    +1.9a1cb8f276276X-002    -1.6276276276276X-002    -1.89d89d89d89d8X-003    -1.13b13b13b13b1X-002
    2    2    +1.7f3904793b13bX-001    +1.4ec4ec4ec4ec5X-001    +1.9d89d89d89d8aX-001    +1.7627627627628X-001
    3    3    +1.7f5108793b13bX-001    +1.3b13b13b13b14X-003    +1.3b13b13b13b14X-002    -1.13b13b13b13b1X-002
    3    3    +1.99ecb0f276276X-002    +1.3b13b13b13b14X-003    +1.3b13b13b13b14X-002    +1.7627627627628X-001
    4    3    +1.9406ad793b13bX-001    +1.3b13b13b13b14X-003    -1.89d89d89d89d8X-003    +1.d89d89d89d89eX-003
    4    3    +1.708166f276276X-002    +1.3b13b13b13b14X-003    +1.9d89d89d89d8aX-001    +1.d89d89d89d89eX-003
    5    4    +1.7623bcf276276X-002    +1.3b13b13b13b14X-003    -1.89d89d89d89d8X-003    +1.d89d89d89d89eX-003
    5    4    +1.913582793b13bX-001    +1.3b13b13b13b14X-003    +1.9d89d89d89d8aX-001    +1.d89d89d89d89eX-003
    6    5    +1.2623b0793b13bX-001    +1.3b13b13b13b14X-003    +1.3b13b13b13b14X-002    +1.d89d89d89d89eX-003
    7    6    +1.2623b0793b13bX-001    +1.3b13b13b13b14X-003    +1.3b13b13b13b14X-002    +1.d89d89d89d89eX-003
    8    7    +1.2623b0793b13bX-001    +1.3b13b13b13b14X-003    +1.3b13b13b13b14X-002    +1.d89d89d89d89eX-003
    end
    
    qui _regress y a b c, mse1
    matrix b = get(_b)
    matrix V = get(VCE)
    predict double res, res
    
    forvalues j = 0/9 {
        matrix b`j' = b
        matrix V`j' = V
        _robust2 res, minus(11) cluster(cl) variance(V`j')
        *mat list V`j'
        cap eret post b`j' V`j'
        di "Loop `j': _se[a] = " _se[a]
    }
    I think this shows that _robust2 is nondeterministic. Based on some further exploration, I think the mata function _robust2() in robust.mata is itself non-deterministic, but I'm not confident about that.

    I don't know if _robust2 is the only source of nondeterminicity in the original fixed-effects clustered regression.
    Last edited by Nils Enevoldsen; 27 Apr 2016, 20:46.

    Comment


    • #3
      Tech Support response: "Thank you for taking the time to prepare these examples to illustrate the differences in those results. I passed the report to the developers and they will evaluate the problem."

      Comment

      Working...
      X