How to use Mata for calculations within a Stata ML evaluator program?

Stephen Jenkins

Join Date: Apr 2014
Posts: 1438

How to use Mata for calculations within a Stata ML evaluator program?

02 Dec 2019, 00:03

[This is picking up on the thread I began here: https://www.statalist.org/forums/for...normal-density]

I'm trying to fit by Maximum Likelihood measurement error models using the specifications of Kapteyn and Ypma (Journal of Labor Economics, 2007: https://www.jstor.org/stable/10.1086...o_tab_contents). They are essentially mixture models involving evaluation of multivariate normal distributions: see their Appendix B for likelihood functions for their general model. I am trying, at least initially, to fit their 'basic model'.

My likelihood evaluator function for the basic model is currently as follows (in a do-file called basic_ll6.do):

Code:

capture program drop basic_ll6

prog def basic_ll6

    version 8.2
    
    args lnf mu_xi atrho lsig_xi lsig_eta
    
    tempname m v11 v12 v22 M V X r d
    
    scalar `m' = `mu_xi'
    
    scalar `r' = (exp(2*`atrho') - 1) / (exp(2*`atrho') + 1) // rho
    
    matrix `M' = ( `m', `m' )
    
    scalar `v11' =  (exp(`lsig_xi'))^2
    
    scalar `v12' = ((1+`r')*(exp(`lsig_xi'))^2)  ///
        /( exp(`lsig_xi')*sqrt((1+`r')^2)*((exp(`lsig_xi'))^2) + ((exp(`lsig_eta'))^2) )
    
    scalar `v22' = ((1+`r')^2)*((exp(`lsig_xi'))^2) + (exp(`lsig_eta'))^2
    
    matrix `V' = (  `v11' , `v12' \ `v12', `v22'   )
    
    capture {
                // check that `V' is pos semi definite
        scalar `d' = det(`V')
        if scalar(`d') < 0 error 506  
    }
    
    putmata X1 = ($R_i $S_i)
    mata:      
        M1 = st_matrix("`M'")
        V1 = st_matrix("`V'")
        myeval = __lnmvnormalden(M1,V1,X1)    
    end
    
    getmata `result' = myeval
    
    qui replace `lnf' = `result' if $L_i == 0
    
    qui replace `lnf' = lnnormalden($R_i, `mu_xi', exp(`lsig_xi')) ///
            if $L_i == 1
    
end

Observe the call to Mata to evaluate a bivariate normal density. (Evaluation can only be done in Mata; not in Stata -- see the Statalist thread cited at the beginning of this post.)

My immediate problem is that if I run basic_ll6.do, Stata throws an error. Here is an extract from the log-file after a set trace on:

Code:

   --------------------------------------------------------- begin getmata ---
    - version 11
    - syntax [anything(name=getlist id="getlist" equalok)] [, DOUBLE FORCE ID(s
> tring) REPLACE UPdate]
    - mata: get()
= invalid vector or matrix name
    ----------------------------------------------------------- end getmata ---
r(198);

Thus Mata treats the matrix with temporary name `M' as undefined in st_matrix("`M'"). The same error is thrown if I place the run command within a do-file surrounded by code reading in the data and setting up the ml command.

I have not found anything on Statalist or in the manuals that directly addresses my problem; perhaps the closest is this thread: https://www.statalist.org/forums/for...e_lf-not-found (with useful contributions by @Matthew J. Baker and @Jeff Pitblado).

I'm guessing from that thread that I need to write a Mata function to do the calculation, perhaps also without putmata and getmata calls -- along the lines suggested by Matt or by Jeff.

I'd be grateful for comments on my conjectures and suggestions about how to proceed. Thank you. (I am a Mata newbie.)

Once I've managed to fit the Basic Model, I wish to fit the other, more complex, specifications in the Kapteyn-Ypma paper. Their data and my data are not able to be shared, but you can simulate data from their full model with the following code:

Code:

* simulate data from Kapteyn-Ypma so that can develop code 
* away from secure server
*
* Model: page 527. Parameter values: Table C2
*************************************************
set seed 123456789

set obs 10000

scalar mu_xi = 12.283
scalar sig_xi = 0.717
scalar mu_zeta = 9.187    
scalar sig_zeta = 1.807
scalar mu_eta = -0.048    
scalar sig_eta = 0.99
scalar mu_omega = -0.304
scalar sig_omega = 1.239
scalar rho = -0.013
scalar pi_r = 0.959
scalar pi_s = 0.152
scalar pi_omega = 0.156

scalar list

/*
rnormal(m, s)
       Description:  normal(m,s) (Gaussian) random variates, where m is the
                     mean and s is the standard deviation
*/

ge xi_i = rnormal(mu_xi, sig_xi)
ge zeta_i = rnormal(mu_zeta, sig_zeta)
ge eta_i = rnormal(mu_eta, sig_eta)
ge omega_i = rnormal(mu_omega, sig_omega)
ge r_i = pi_r * xi_i ///
        + (1 - pi_r) * zeta_i
ge s_i = pi_s * xi_i            ///
        + (1 - pi_s)*(1 - pi_omega)*( xi_i + rho*(xi_i - mu_xi) + eta_i )   ///
        + (1 - pi_s)*pi_omega*( xi_i + rho*(xi_i - mu_xi) + eta_i + omega_i)
ge m_i = s_i - r_i
        
lab var r_i "register log earnings"
lab var s_i "survey log earnings"
lab var m_i "difference: s_i - r_i"

* Compare K-Y Table 4, page 524
su r_i, de
local Vr = r(Var)
su s_i, de
su m_i, de
local Vm = r(Var)
corr m_i r_i
di "Reliability = "  `Vr' / (`Vr' + `Vm')

* Compare K-Y Table C2 (sample counterparts to parameters)
su xi_i zeta_i eta_i omega_i, de
de

save simdata01.dta, replace

Tags: None

FernandoRios

Join Date: Apr 2014
Posts: 2479

02 Dec 2019, 09:52

Hi Stephen
Not I was trying to see if there was a small and quick solution for your problem. there were a few details in your program that were not completely clear.
For example, where do you define $L_i? $R_i? or $S_i.
In any case, I think the main problem came from the Multivariate Normal density evaluation. So I created a small program that may help with what can be done.
I hope this helps.
Fernando
Here is the code:

Code:

capture program drop llm
prog def llm
    args lnf mu1 mu2 lns1 lns2 arho
  qui {  
    local muu1=`mu1'
    local muu2=`mu2'
    local v1=exp(`lns1')^2
    local v2=exp(`lns2')^2
    local v12=exp(`lns1')*exp(`lns2')*tanh(`arho')
    matrix V = [ `v1' , `v12' \ `v12', `v2'   ]
    matrix M = [ `muu1' , `muu2'   ]
     tempvar result
    mata:lmm("r_i s_i", "M", "V","`result'")
    qui replace `lnf' = `result'
    }
  
end
 

ml model lf llm /mu1 /mu2 /lns1 /lns2 /arho, technique(nr bhhh) maximize robust
ml display
nlcom (exp(_b[/lns1])^2) (exp(_b[/lns2])^2) (tanh(_b[/arho]))

mata:
 
void lmm(string scalar R, string scalar M, string scalar V,string scalar denx) {
        X1 = st_data(.,R)
        M1 = st_matrix(M)
        V1 = st_matrix(V)
        myeval=J(rows(X1),1,0)
        for(i=1;i<=rows(X1);i++){
        myeval[i] = lnmvnormalden(M1,V1,X1[i,.])    
        }
         st_addvar("double",denx)
        st_store(.,denx,myeval)
        }
        end

PS. I couldnt find the program __lnmvnormaden, nor couldnt make it work for the full data. That is why im evaluating it here observation by observation in Mata.

Comment

Stephen Jenkins

Join Date: Apr 2014

Posts: 1438
#3

02 Dec 2019, 14:05

Many thanks, Fernando! I’ll look at your helpful response later today. Meanwhile: __lnmvnormalden is an undocumented Mata function. (There’s reference to it and illustration of its use in the link at the top of my message.)
Comment

Stephen Jenkins

Join Date: Apr 2014
Posts: 1438

02 Dec 2019, 19:16

Interim report: I can derive the same estimates as Fernando's very helpful code does if I use __lnmvnormalden instead of his looping construct. Here's what I ran:

Code:

mata:

mata drop lmm2()

void lmm2(string scalar R, string scalar M, string scalar V,string scalar denx) {
        X1 = st_data(.,R)
        M1 = st_matrix(M)
        V1 = st_matrix(V)
        myeval= __lnmvnormalden(M1,V1,X1)
        st_addvar("double",denx)
        st_store(.,denx,myeval)
        }
end


capture program drop llm
prog def llm
    args lnf mu1 mu2 lns1 lns2 arho
  qui {  
    local muu1=`mu1'
    local muu2=`mu2'
    local v1=exp(`lns1')^2
    local v2=exp(`lns2')^2
    local v12=exp(`lns1')*exp(`lns2')*tanh(`arho')
    matrix V = [ `v1' , `v12' \ `v12', `v2'   ]
    matrix M = [ `muu1' , `muu2'   ]
     tempvar result
    mata:lmm2("r_i s_i", "M", "V","`result'")
    qui replace `lnf' = `result'
    }
  
end
 

ml model lf llm /mu1 /mu2 /lns1 /lns2 /arho, technique(nr bhhh) maximize robust
ml display
nlcom (exp(_b[/lns1])^2) (exp(_b[/lns2])^2) (tanh(_b[/arho]))

and the results were:

Code:

. ml display

                                                Number of obs     =     10,000
                                                Wald chi2(5)      = 3133295.61
Log pseudolikelihood = -23221.555               Prob > chi2       =     0.0000

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        /mu1 |   12.15767    .006882  1766.60   0.000     12.14418    12.17116
        /mu2 |   12.20238   .0111834  1091.12   0.000     12.18046     12.2243
       /lns1 |  -.3737319   .0071197   -52.49   0.000    -.3876863   -.3597776
       /lns2 |   .1117927    .007055    15.85   0.000     .0979651    .1256203
       /arho |    .742917   .0100236    74.12   0.000     .7232712    .7625628
------------------------------------------------------------------------------

. nlcom (exp(_b[/lns1])^2) (exp(_b[/lns2])^2) (tanh(_b[/arho]))

       _nl_1:  exp(_b[/lns1])^2
       _nl_2:  exp(_b[/lns2])^2
       _nl_3:  tanh(_b[/arho])

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |   .4735661   .0067433    70.23   0.000     .4603495    .4867827
       _nl_2 |   1.250552   .0176454    70.87   0.000     1.215968    1.285137
       _nl_3 |   .6309043   .0060338   104.56   0.000     .6190783    .6427303
------------------------------------------------------------------------------

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2479
#5

02 Dec 2019, 21:54

Im glad you found the program useful.
I realize that __lnmvnormalden(M1,V1,X1) does not work for me under stata 15 or 14, but it does work on stata 16. So my small program may be useful for other people trying to replicate the code in earlier stata versions.
Best
Fernando
Comment

Announcement

How to use Mata for calculations within a Stata ML evaluator program?

Comment

Comment

Comment

Comment