Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use Mata for calculations within a Stata ML evaluator program?

    [This is picking up on the thread I began here: https://www.statalist.org/forums/for...normal-density]

    I'm trying to fit by Maximum Likelihood measurement error models using the specifications of Kapteyn and Ypma (Journal of Labor Economics, 2007: https://www.jstor.org/stable/10.1086...o_tab_contents). They are essentially mixture models involving evaluation of multivariate normal distributions: see their Appendix B for likelihood functions for their general model. I am trying, at least initially, to fit their 'basic model'.

    My likelihood evaluator function for the basic model is currently as follows (in a do-file called basic_ll6.do):

    Code:
    capture program drop basic_ll6
    
    prog def basic_ll6
    
        version 8.2
        
        args lnf mu_xi atrho lsig_xi lsig_eta
        
        tempname m v11 v12 v22 M V X r d
        
        scalar `m' = `mu_xi'
        
        scalar `r' = (exp(2*`atrho') - 1) / (exp(2*`atrho') + 1) // rho
        
        matrix `M' = ( `m', `m' )
        
        scalar `v11' =  (exp(`lsig_xi'))^2
        
        scalar `v12' = ((1+`r')*(exp(`lsig_xi'))^2)  ///
            /( exp(`lsig_xi')*sqrt((1+`r')^2)*((exp(`lsig_xi'))^2) + ((exp(`lsig_eta'))^2) )
        
        scalar `v22' = ((1+`r')^2)*((exp(`lsig_xi'))^2) + (exp(`lsig_eta'))^2
        
        matrix `V' = (  `v11' , `v12' \ `v12', `v22'   )
        
        capture {
                    // check that `V' is pos semi definite
            scalar `d' = det(`V')
            if scalar(`d') < 0 error 506  
        }
        
        putmata X1 = ($R_i $S_i)
        mata:      
            M1 = st_matrix("`M'")
            V1 = st_matrix("`V'")
            myeval = __lnmvnormalden(M1,V1,X1)    
        end
        
        getmata `result' = myeval
        
        qui replace `lnf' = `result' if $L_i == 0
        
        qui replace `lnf' = lnnormalden($R_i, `mu_xi', exp(`lsig_xi')) ///
                if $L_i == 1
        
    end
    Observe the call to Mata to evaluate a bivariate normal density. (Evaluation can only be done in Mata; not in Stata -- see the Statalist thread cited at the beginning of this post.)

    My immediate problem is that if I run basic_ll6.do, Stata throws an error. Here is an extract from the log-file after a set trace on:

    Code:
       --------------------------------------------------------- begin getmata ---
        - version 11
        - syntax [anything(name=getlist id="getlist" equalok)] [, DOUBLE FORCE ID(s
    > tring) REPLACE UPdate]
        - mata: get()
    = invalid vector or matrix name
        ----------------------------------------------------------- end getmata ---
    r(198);
    Thus Mata treats the matrix with temporary name `M' as undefined in st_matrix("`M'"). The same error is thrown if I place the run command within a do-file surrounded by code reading in the data and setting up the ml command.

    I have not found anything on Statalist or in the manuals that directly addresses my problem; perhaps the closest is this thread: https://www.statalist.org/forums/for...e_lf-not-found (with useful contributions by @Matthew J. Baker and @Jeff Pitblado).

    I'm guessing from that thread that I need to write a Mata function to do the calculation, perhaps also without putmata and getmata calls -- along the lines suggested by Matt or by Jeff.

    I'd be grateful for comments on my conjectures and suggestions about how to proceed. Thank you. (I am a Mata newbie.)

    Once I've managed to fit the Basic Model, I wish to fit the other, more complex, specifications in the Kapteyn-Ypma paper. Their data and my data are not able to be shared, but you can simulate data from their full model with the following code:

    Code:
    * simulate data from Kapteyn-Ypma so that can develop code 
    * away from secure server
    *
    * Model: page 527. Parameter values: Table C2
    *************************************************
    set seed 123456789
    
    set obs 10000
    
    scalar mu_xi = 12.283
    scalar sig_xi = 0.717
    scalar mu_zeta = 9.187    
    scalar sig_zeta = 1.807
    scalar mu_eta = -0.048    
    scalar sig_eta = 0.99
    scalar mu_omega = -0.304
    scalar sig_omega = 1.239
    scalar rho = -0.013
    scalar pi_r = 0.959
    scalar pi_s = 0.152
    scalar pi_omega = 0.156
    
    scalar list
    
    /*
    rnormal(m, s)
           Description:  normal(m,s) (Gaussian) random variates, where m is the
                         mean and s is the standard deviation
    */
    
    ge xi_i = rnormal(mu_xi, sig_xi)
    ge zeta_i = rnormal(mu_zeta, sig_zeta)
    ge eta_i = rnormal(mu_eta, sig_eta)
    ge omega_i = rnormal(mu_omega, sig_omega)
    ge r_i = pi_r * xi_i ///
            + (1 - pi_r) * zeta_i
    ge s_i = pi_s * xi_i            ///
            + (1 - pi_s)*(1 - pi_omega)*( xi_i + rho*(xi_i - mu_xi) + eta_i )   ///
            + (1 - pi_s)*pi_omega*( xi_i + rho*(xi_i - mu_xi) + eta_i + omega_i)
    ge m_i = s_i - r_i
            
    lab var r_i "register log earnings"
    lab var s_i "survey log earnings"
    lab var m_i "difference: s_i - r_i"
    
    * Compare K-Y Table 4, page 524
    su r_i, de
    local Vr = r(Var)
    su s_i, de
    su m_i, de
    local Vm = r(Var)
    corr m_i r_i
    di "Reliability = "  `Vr' / (`Vr' + `Vm')
    
    * Compare K-Y Table C2 (sample counterparts to parameters)
    su xi_i zeta_i eta_i omega_i, de
    de
    
    save simdata01.dta, replace




  • #2
    Hi Stephen
    Not I was trying to see if there was a small and quick solution for your problem. there were a few details in your program that were not completely clear.
    For example, where do you define $L_i? $R_i? or $S_i.
    In any case, I think the main problem came from the Multivariate Normal density evaluation. So I created a small program that may help with what can be done.
    I hope this helps.
    Fernando
    Here is the code:
    Code:
    capture program drop llm
    prog def llm
        args lnf mu1 mu2 lns1 lns2 arho
      qui {  
        local muu1=`mu1'
        local muu2=`mu2'
        local v1=exp(`lns1')^2
        local v2=exp(`lns2')^2
        local v12=exp(`lns1')*exp(`lns2')*tanh(`arho')
        matrix V = [ `v1' , `v12' \ `v12', `v2'   ]
        matrix M = [ `muu1' , `muu2'   ]
         tempvar result
        mata:lmm("r_i s_i", "M", "V","`result'")
        qui replace `lnf' = `result'
        }
      
    end
     
    
    ml model lf llm /mu1 /mu2 /lns1 /lns2 /arho, technique(nr bhhh) maximize robust
    ml display
    nlcom (exp(_b[/lns1])^2) (exp(_b[/lns2])^2) (tanh(_b[/arho]))
    
    mata:
     
    void lmm(string scalar R, string scalar M, string scalar V,string scalar denx) {
            X1 = st_data(.,R)
            M1 = st_matrix(M)
            V1 = st_matrix(V)
            myeval=J(rows(X1),1,0)
            for(i=1;i<=rows(X1);i++){
            myeval[i] = lnmvnormalden(M1,V1,X1[i,.])    
            }
             st_addvar("double",denx)
            st_store(.,denx,myeval)
            }
            end
    PS. I couldnt find the program __lnmvnormaden, nor couldnt make it work for the full data. That is why im evaluating it here observation by observation in Mata.

    Comment


    • #3
      Many thanks, Fernando! I’ll look at your helpful response later today. Meanwhile: __lnmvnormalden is an undocumented Mata function. (There’s reference to it and illustration of its use in the link at the top of my message.)

      Comment


      • #4
        Interim report: I can derive the same estimates as Fernando's very helpful code does if I use __lnmvnormalden instead of his looping construct. Here's what I ran:

        Code:
        mata:
        
        mata drop lmm2()
        
        void lmm2(string scalar R, string scalar M, string scalar V,string scalar denx) {
                X1 = st_data(.,R)
                M1 = st_matrix(M)
                V1 = st_matrix(V)
                myeval= __lnmvnormalden(M1,V1,X1)
                st_addvar("double",denx)
                st_store(.,denx,myeval)
                }
        end
        
        
        capture program drop llm
        prog def llm
            args lnf mu1 mu2 lns1 lns2 arho
          qui {  
            local muu1=`mu1'
            local muu2=`mu2'
            local v1=exp(`lns1')^2
            local v2=exp(`lns2')^2
            local v12=exp(`lns1')*exp(`lns2')*tanh(`arho')
            matrix V = [ `v1' , `v12' \ `v12', `v2'   ]
            matrix M = [ `muu1' , `muu2'   ]
             tempvar result
            mata:lmm2("r_i s_i", "M", "V","`result'")
            qui replace `lnf' = `result'
            }
          
        end
         
        
        ml model lf llm /mu1 /mu2 /lns1 /lns2 /arho, technique(nr bhhh) maximize robust
        ml display
        nlcom (exp(_b[/lns1])^2) (exp(_b[/lns2])^2) (tanh(_b[/arho]))
        and the results were:

        Code:
        . ml display
        
                                                        Number of obs     =     10,000
                                                        Wald chi2(5)      = 3133295.61
        Log pseudolikelihood = -23221.555               Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
                     |               Robust
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                /mu1 |   12.15767    .006882  1766.60   0.000     12.14418    12.17116
                /mu2 |   12.20238   .0111834  1091.12   0.000     12.18046     12.2243
               /lns1 |  -.3737319   .0071197   -52.49   0.000    -.3876863   -.3597776
               /lns2 |   .1117927    .007055    15.85   0.000     .0979651    .1256203
               /arho |    .742917   .0100236    74.12   0.000     .7232712    .7625628
        ------------------------------------------------------------------------------
        
        . nlcom (exp(_b[/lns1])^2) (exp(_b[/lns2])^2) (tanh(_b[/arho]))
        
               _nl_1:  exp(_b[/lns1])^2
               _nl_2:  exp(_b[/lns2])^2
               _nl_3:  tanh(_b[/arho])
        
        ------------------------------------------------------------------------------
                     |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               _nl_1 |   .4735661   .0067433    70.23   0.000     .4603495    .4867827
               _nl_2 |   1.250552   .0176454    70.87   0.000     1.215968    1.285137
               _nl_3 |   .6309043   .0060338   104.56   0.000     .6190783    .6427303
        ------------------------------------------------------------------------------


        Comment


        • #5
          Im glad you found the program useful.
          I realize that __lnmvnormalden(M1,V1,X1) does not work for me under stata 15 or 14, but it does work on stata 16. So my small program may be useful for other people trying to replicate the code in earlier stata versions.
          Best
          Fernando

          Comment

          Working...
          X