Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why does mvnormal() occasionally produce missing values?

    Dear all,

    I am using Stata SE 17.0, update level 13 Feb 2024. I found that the mvnormal-family commands, e.g., mvnormal(), mvnormalqp(), occasionally produce missing values. Here is an example:
    Code:
    m: U =0, 5, 5, -10 //upper limits
    m: R =1, 0, 0, 0,  1, .5, .5,  1, .5,  1 //vectorized correlation matrices
    
    m: mvnormal(U, R) //returns a missing value
    .
    
    m: mvnormalqp(U, R, 5000) //returns a missing value, even if the maximum number of quadrature points is specified
    .

    It seems that this issue occurs not because the upper limits are too extreme. As can be seen from the examples below, both MoreExtremeU and LessExtremeU work just fine. mvnormalqp() returns non-missing results.
    Code:
    m: R =1, 0, 0, 0,  1, .5, .5,  1, .5,  1
    
    m: MoreExtremeU =0, 5, 5, -10-1
    m: mvnormalqp(MoreExtremeU, R, 5000)
    6.89514e-60
    
    m: LessExtremeU =0, 5, 5, -10+1
    m: mvnormalqp(LessExtremeU, R, 5000)
    3.20780e-20

    So far I have only found this issue when the theoretical result is close to 0, but I am not sure if the same issue would occur when the result is close to 1, so I am still hesitating over whether I should replace all missing values with zero or not.

    I use mvnormalqp() to write my maximum likelihood estimation model (a lf-type evaluator programmed in Mata and used in Stata command "ml model"). I also use mvnormalqp() to write my post-estimation predict programme, which is then automatically used in the official command margins. Therefore, my questions are:

    1) Is there any way to avoid mvnormalqp() generating missing values?

    2) Does the missing-value issue of mvnormalqp() matter to the estimation of ml model and margins? I know that ml model with a lf-type evaluator has some protection against missing values in the log-likelihood function, but does margins do similar protection too?



    Thank you.

    Chi-lin
    Last edited by Chi-lin Tsai; 26 Apr 2024, 10:57. Reason: Add tags

  • #2
    After some experimentations, I found two additional points:

    First, the missing-value issue is even more severe, when the lower limits are explicitly specified. When mindouble() is specified as the lower limit (i.e., the smallest value that can be stored in storage type double), mvnormalqp() does return a non-missing value, but it is negative, which is nonsense, as the probability can't be negative:
    Code:
    m: U =0, 5, 5, -10 //upper limits
    m: R =1, 0, 0, 0,  1, .5, .5,  1, .5,  1 //vectorized correlation matrices
    
    m: L =J(1,cols(U),mindouble()) //the lower limits
    m: M =J(1,cols(U),0) //means
    
    m: mvnormalqp(     U,    R, 5000), ///
       mvnormalqp(  L, U,    R, 5000), /// //specify lower limits
       mvnormalcvqp(L, U, M, R, 5000)      //specify means as well
    
                    1              2              3
      +----------------------------------------------+
    1 |             .   -1.28075e-25   -1.28075e-25  |
      +----------------------------------------------+

    Second, by comparing the results produced by mvnormalqp() and ghk(), it is almost sure that the missing-value issue occurs, only when the probability is close to 0.
    Code:
    mata:
    rseed(123)
    N =100
    U =runiform(N, 4, -20, 20) //randomly generate upper limits
    L =J(1,cols(U),mindouble()) //lower limits
    M =J(1,cols(U),0) //means
    i =1
    while (i>0) { //randomly generate a positive definite correlation matrix
        R =1, runiform(1, 3, -.5, .5),
           1, runiform(1, 2, -.5, .5),
           1, runiform(1, 1, -.5, .5), 1
        eigensystem(invvech(R'), Evec=., Eval=.)
        i =Eval[1]<0 + Eval[2]<0 + Eval[3]<0 + Eval[4]<0
    }
    S =ghkfast_init(N, 5000, 4, "hammersley")
    P =mvnormalqp(     U,    R, 5000),
       mvnormalqp(  L, U,    R, 5000), //specify lower limits
       mvnormalcvqp(L, U, M, R, 5000), //specify means as well
       ghkfast(S, U, invvech(R'))      //ghk algorithm
    end
    
    m: colmissing(P) //display num of missing-value cases
                     1              2              3              4
       +-------------------------------------------------------------+
     1 |            31              0              0              0  |
       +-------------------------------------------------------------+
    
    m: P[selectindex(P[,1]:==.),] //display missing-value cases
    
                     1              2              3              4
       +-------------------------------------------------------------+
     1 |             .   -9.04304e-13   -9.04304e-13    1.28863e-12  |
     2 |             .   -4.73616e-61   -4.73616e-61    2.64472e-87  |
     3 |             .   -1.02131e-10   -1.02131e-10    9.42588e-53  |
     4 |             .   -1.18889e-58   -1.18889e-58    6.12344e-89  |
     5 |             .   -2.39505e-36   -2.39505e-36    2.28163e-38  |
     6 |             .   -3.32988e-18   -3.32988e-18    1.87886e-64  |
     7 |             .   -1.30611e-21   -1.30611e-21    2.48153e-38  |
     8 |             .   -3.65511e-40   -3.65511e-40    4.26120e-76  |
     9 |             .   -7.31193e-25   -7.31193e-25    2.77654e-33  |
    10 |             .   -1.69346e-17   -1.69346e-17    4.25672e-78  |
    11 |             .   -9.05412e-27   -9.05412e-27    1.14299e-38  |
    12 |             .   -9.32544e-11   -9.32544e-11    2.54511e-22  |
    13 |             .   -1.08026e-15   -1.08026e-15    1.97692e-34  |
    14 |             .   -4.91895e-45   -4.91895e-45    4.16277e-57  |
    15 |             .   -1.17356e-11   -1.17356e-11    1.42053e-50  |
    16 |             .   -2.35085e-12   -2.35085e-12    5.89600e-55  |
    17 |             .   -2.14321e-16   -2.14321e-16    2.86704e-39  |
    18 |             .   -6.93790e-38   -6.93790e-38    1.49411e-58  |
    19 |             .   -6.0776e-180   -6.0776e-180    5.9466e-130  |
    20 |             .   -1.10515e-12   -1.10515e-12    8.93542e-60  |
    21 |             .   -1.80679e-41   -1.80679e-41    4.6324e-105  |
    22 |             .   -1.36669e-43   -1.36669e-43    6.75590e-59  |
    23 |             .   -4.13118e-12   -4.13118e-12    9.77657e-12  |
    24 |             .   -1.25710e-24   -1.25710e-24    6.06826e-29  |
    25 |             .   -6.76242e-38   -6.76242e-38    8.55283e-52  |
    26 |             .   -7.39241e-57   -7.39241e-57    2.31159e-57  |
    27 |             .   -2.60058e-15   -2.60058e-15    1.80489e-45  |
    28 |             .   -9.36494e-99   -9.36494e-99    6.5647e-106  |
    29 |             .   -8.80583e-61   -8.80583e-61    1.16315e-65  |
    30 |             .   -7.18445e-22   -7.18445e-22    5.72767e-31  |
    31 |             .   -1.66582e-83   -1.66582e-83    2.50751e-76  |
       +-------------------------------------------------------------+

    Desipte these findings, I still have not found a solution to the missing-value issue of mvnormal-family commands. Instead, these findings raise two new questions:

    1) What algorithm does mvnormal use to calculate the probability? I thought it was using the ghk algorithm, but apparently it is not (because they produce different results.)

    2) Which one -- mvnormal or ghk -- is better in terms of accuracy and speed?


    Comments and remarks are welcome. Thank you very much again.
    Last edited by Chi-lin Tsai; 05 May 2024, 04:25.

    Comment


    • #3
      Chi-lin: I was able to replicate the first missing result in #1 using v18.0 on a Macbook running Sonoma 14.4.1.

      Since no one has yet offered an explanation or solution on Statalist I would suggest contacting Stata Technical Support.

      Comment

      Working...
      X