Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replicate the standard errors (SE) and variance-covariance matrix (VCE) produced by random effects probit model (xtprobit, re)?

    Dear Statalist Members,

    This might be a little bit more technical question, so I hope that it fits into the boundaries of Statalist forum. I would contact the technical support but I believe that the issue is of methodological nature, which is why I am kindly asking for your help.

    With the goal to perform the adjustment for two-step estimation detailed in the book ‘Econometric Analysis of Cross Section and Panel Data’ (Wooldridge 2010, pp. 418-420), I am struggling to replicate the variance-covariance matrix (VCE) of random effects (RE) probit estimator computed by Stata. According to the ‘Stata 14 Base Reference Manual’, the default VCE is ‘conventional’ which “uses the conventionally derived variance estimator for generalized least-squares regression” (StataCorp 2015, XT p. 365). I am aware that this statement is no longer featured in the latest version of the manual, ‘Stata 15 Base Reference Manual’ (StataCorp 2017, XT p. 382), however, it was the best lead that I found on how Stata actually computes the VCE for RE probit. Following the approach for computing the VCE of weighted multivariate nonlinear least squares estimator described by Wooldridge (2010, pp. 444-449), I have obtained a VCE which is in the vicinity of Stata output for RE probit estimator. However, its precision is nowhere near the one I obtained when I had computed the VCE for linear RE model following the approach described in ‘Stata 14 Base Reference Manual’ (StataCorp 2015, XT pp. 424-425) and compared it to the output of “xtreg, re” command.

    To illustrate the problem I prepared a code featured below, which contains an example of the approach applied by using the web-based ‘nlswork’ database, computes the conventional (observed information matrix) and cluster-robust RE probit VCE, and compares them with their counterparts produced by Stata. In order to save you time running the code, I pasted Stata output (version 14.2) of last five lines of the code towards the end of the post. Although the difference is small, I believe that it may be caused by Stata treating the log of panel-level variance (lnsig2u) as an estimated parameter, which is subsequently featured in the VCE computed by Stata and is missing in my VCE. Therefore, I would like to kindly ask if you could please point me to the right direction of how the VCE is computed by Stata in case of RE probit and how to incorporate the log of panel-level variance incorporate into the VCE? Any reference to literature or methodological sources in this regard which I might have missed would be greatly appreciated. I hope that I do not ask about something which is considered confidential. If yeas I deeply apologize for bringing this question up and in any case thank you very much for reading this long post.

    Kind regards,

    Filip Ostrihon
    Stata/SE 14.2

    References:
    StataCorp. (2015): Stata 14 Base Reference Manual. College Station, TX: Stata Press. ISBN 978-1-59718-171-6
    StataCorp. (2017): Stata 15 Base Reference Manual. College Station, TX: Stata Press. ISBN 978-1-59718-256-0
    Wooldridge, J. M. (2010): Econometric Analysis of Cross Section and Panel Data - 2nd ed. Cambridge: The MIT Press. ISBN 978-0-262-23258-6

    Code:
    set more off
    webuse nlswork, clear
    xtset idcode year
    generate has_work=0
    replace has_work=1 if wks_work!=0
    generate constant=1
    generate panelv = 0
    bysort idcode: replace panelv = 1 if _n == 1
    local depv="has_work"
    local specwage="grade ttl_exp"
    xtprobit `depv' `specwage', re vce(robust)
    estimates store eqwage
    unab rhsvarswage : `specwage' constant
    scalar rhore=e(rho)
    generate smpl=e(sample)
    generate robsmpl=0
    replace robsmpl=1 if smpl==1 & panelv==1
    predict xb if smpl==1, xb
    generate reshat=`depv'-normal(xb) if smpl==1
    generate phis=normalden(xb) if smpl==1
    generate phil=normal(xb) if smpl==1
    generate denominator=sqrt(phil*(1-phil)) if smpl==1
    foreach var in `rhsvarswage' {
        generate `var's=`var'*phis/denominator if smpl==1
        by idcode: egen `var'm=mean(`var's) if smpl==1
        by idcode: egen `var't=count(`var's) if smpl==1
        generate `var'sm = ///
        sqrt(1/(1-rhore))*`var's-(sqrt(1/(1-rhore))-sqrt(1/(1+(`var't-1)*rhore)))*`var'm ///
        if smpl==1
    }
    generate res=reshat/denominator if smpl==1
    by idcode: egen resm=mean(res) if smpl==1
    by idcode: egen rest=count(res) if smpl==1
    generate ressm = ///
    sqrt(1/(1-rhore))*res-(sqrt(1/(1-rhore))-sqrt(1/(1+(rest-1)*rhore)))*`var'm ///
    if smpl==1
    local scorerob=""
    local transvar=""
    foreach var in `rhsvarswage' {
        generate score_`var'=`var'sm*ressm if smpl==1
        by idcode: egen score_sum_`var'=sum(score_`var') if smpl==1  
        local scorerob="`scorerob'  score_sum_`var'"
        local transvar="`transvar'  `var'sm"
    }
    mata
        st_view(tvar=., ., tokens("`transvar'"), "smpl")
        st_view(srob=., ., tokens("`scorerob'"), "robsmpl")
        st_view(res=., ., tokens("res"), "smpl")
        invA=cholinv(cross(tvar,tvar))
        B=cross(srob,srob)
        n=rows(tvar)
        k=cols(tvar)
        ncl=rows(srob)
        s2=(res'res)/(n-k)
        st_matrix("invA",invA)
        st_matrix("B",B)
        st_matrix("s2",s2)
        st_matrix("n",n)
        st_matrix("ncl",ncl)
        st_matrix("k",k)
    end
    scalar sig_e2=s2[1,1]
    scalar Nobs=n[1,1]
    scalar Nclust=ncl[1,1]
    scalar Kvar=k[1,1]
    mat V_comp=invA
    mat Vrob_comp=invA*B*invA*((Nclust/(Nclust-1))*((Nobs-1)/(Nobs-Kvar)))
    xtprobit `depv' `specwage', re
    mat list e(V)
    mat list V_comp
    estimates restore eqwage
    mat list e(V)
    mat list Vrob_comp
    Output of last five lines of code (Stata/SE 14.2)
    Code:
    . mat list e(V)
    
    symmetric e(V)[4,4]
                        has_work:   has_work:   has_work:    lnsig2u:
                           grade     ttl_exp       _cons       _cons
      has_work:grade   .00007097
    has_work:ttl_exp     -.00001   .00003009
      has_work:_cons  -.00081693  -6.243e-06   .01026574
       lnsig2u:_cons   7.904e-06   .00001659   .00081449   220.88474
    
    . mat list V_comp
    
    symmetric V_comp[3,3]
                c1          c2          c3
    r1   .00007175
    r2  -.00001034   .00002998
    r3  -.00082485  -1.678e-06   .01034183
    
    . estimates restore eqwage
    (results eqwage are active now)
    
    . mat list e(V)
    
    symmetric e(V)[4,4]
                        has_work:   has_work:   has_work:    lnsig2u:
                           grade     ttl_exp       _cons       _cons
      has_work:grade   .00007894
    has_work:ttl_exp  -.00001217   .00002886
      has_work:_cons  -.00085791   .00002902   .01132769
       lnsig2u:_cons   11.523521   .16033801   142.30662    61293428
    
    . mat list Vrob_comp
    
    symmetric Vrob_comp[3,3]
                c1          c2          c3
    r1   .00007994
    r2  -.00001359   .00002898
    r3  -.00091537   .00004411   .01128811

  • #2
    You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. You will also increase your chances of a response by offering a shorter, more focused posting.

    I often find Statalist is more likely to respond to problems directly related to research - that you want to replicate what Stata has already coded may seem less deserving of our efforts to some observers.

    Comment


    • #3
      Dear Prof. Phil Bromiley,

      Thank you for your reply, advices, and the explanation why the question was not quickly answered.

      Kind regards,

      Filip Ostrihon

      Comment

      Working...
      X