Dear Statalist Members,
This might be a little bit more technical question, so I hope that it fits into the boundaries of Statalist forum. I would contact the technical support but I believe that the issue is of methodological nature, which is why I am kindly asking for your help.
With the goal to perform the adjustment for two-step estimation detailed in the book ‘Econometric Analysis of Cross Section and Panel Data’ (Wooldridge 2010, pp. 418-420), I am struggling to replicate the variance-covariance matrix (VCE) of random effects (RE) probit estimator computed by Stata. According to the ‘Stata 14 Base Reference Manual’, the default VCE is ‘conventional’ which “uses the conventionally derived variance estimator for generalized least-squares regression” (StataCorp 2015, XT p. 365). I am aware that this statement is no longer featured in the latest version of the manual, ‘Stata 15 Base Reference Manual’ (StataCorp 2017, XT p. 382), however, it was the best lead that I found on how Stata actually computes the VCE for RE probit. Following the approach for computing the VCE of weighted multivariate nonlinear least squares estimator described by Wooldridge (2010, pp. 444-449), I have obtained a VCE which is in the vicinity of Stata output for RE probit estimator. However, its precision is nowhere near the one I obtained when I had computed the VCE for linear RE model following the approach described in ‘Stata 14 Base Reference Manual’ (StataCorp 2015, XT pp. 424-425) and compared it to the output of “xtreg, re” command.
To illustrate the problem I prepared a code featured below, which contains an example of the approach applied by using the web-based ‘nlswork’ database, computes the conventional (observed information matrix) and cluster-robust RE probit VCE, and compares them with their counterparts produced by Stata. In order to save you time running the code, I pasted Stata output (version 14.2) of last five lines of the code towards the end of the post. Although the difference is small, I believe that it may be caused by Stata treating the log of panel-level variance (lnsig2u) as an estimated parameter, which is subsequently featured in the VCE computed by Stata and is missing in my VCE. Therefore, I would like to kindly ask if you could please point me to the right direction of how the VCE is computed by Stata in case of RE probit and how to incorporate the log of panel-level variance incorporate into the VCE? Any reference to literature or methodological sources in this regard which I might have missed would be greatly appreciated. I hope that I do not ask about something which is considered confidential. If yeas I deeply apologize for bringing this question up and in any case thank you very much for reading this long post.
Kind regards,
Filip Ostrihon
Stata/SE 14.2
References:
StataCorp. (2015): Stata 14 Base Reference Manual. College Station, TX: Stata Press. ISBN 978-1-59718-171-6
StataCorp. (2017): Stata 15 Base Reference Manual. College Station, TX: Stata Press. ISBN 978-1-59718-256-0
Wooldridge, J. M. (2010): Econometric Analysis of Cross Section and Panel Data - 2nd ed. Cambridge: The MIT Press. ISBN 978-0-262-23258-6
Output of last five lines of code (Stata/SE 14.2)
This might be a little bit more technical question, so I hope that it fits into the boundaries of Statalist forum. I would contact the technical support but I believe that the issue is of methodological nature, which is why I am kindly asking for your help.
With the goal to perform the adjustment for two-step estimation detailed in the book ‘Econometric Analysis of Cross Section and Panel Data’ (Wooldridge 2010, pp. 418-420), I am struggling to replicate the variance-covariance matrix (VCE) of random effects (RE) probit estimator computed by Stata. According to the ‘Stata 14 Base Reference Manual’, the default VCE is ‘conventional’ which “uses the conventionally derived variance estimator for generalized least-squares regression” (StataCorp 2015, XT p. 365). I am aware that this statement is no longer featured in the latest version of the manual, ‘Stata 15 Base Reference Manual’ (StataCorp 2017, XT p. 382), however, it was the best lead that I found on how Stata actually computes the VCE for RE probit. Following the approach for computing the VCE of weighted multivariate nonlinear least squares estimator described by Wooldridge (2010, pp. 444-449), I have obtained a VCE which is in the vicinity of Stata output for RE probit estimator. However, its precision is nowhere near the one I obtained when I had computed the VCE for linear RE model following the approach described in ‘Stata 14 Base Reference Manual’ (StataCorp 2015, XT pp. 424-425) and compared it to the output of “xtreg, re” command.
To illustrate the problem I prepared a code featured below, which contains an example of the approach applied by using the web-based ‘nlswork’ database, computes the conventional (observed information matrix) and cluster-robust RE probit VCE, and compares them with their counterparts produced by Stata. In order to save you time running the code, I pasted Stata output (version 14.2) of last five lines of the code towards the end of the post. Although the difference is small, I believe that it may be caused by Stata treating the log of panel-level variance (lnsig2u) as an estimated parameter, which is subsequently featured in the VCE computed by Stata and is missing in my VCE. Therefore, I would like to kindly ask if you could please point me to the right direction of how the VCE is computed by Stata in case of RE probit and how to incorporate the log of panel-level variance incorporate into the VCE? Any reference to literature or methodological sources in this regard which I might have missed would be greatly appreciated. I hope that I do not ask about something which is considered confidential. If yeas I deeply apologize for bringing this question up and in any case thank you very much for reading this long post.
Kind regards,
Filip Ostrihon
Stata/SE 14.2
References:
StataCorp. (2015): Stata 14 Base Reference Manual. College Station, TX: Stata Press. ISBN 978-1-59718-171-6
StataCorp. (2017): Stata 15 Base Reference Manual. College Station, TX: Stata Press. ISBN 978-1-59718-256-0
Wooldridge, J. M. (2010): Econometric Analysis of Cross Section and Panel Data - 2nd ed. Cambridge: The MIT Press. ISBN 978-0-262-23258-6
Code:
set more off webuse nlswork, clear xtset idcode year generate has_work=0 replace has_work=1 if wks_work!=0 generate constant=1 generate panelv = 0 bysort idcode: replace panelv = 1 if _n == 1 local depv="has_work" local specwage="grade ttl_exp" xtprobit `depv' `specwage', re vce(robust) estimates store eqwage unab rhsvarswage : `specwage' constant scalar rhore=e(rho) generate smpl=e(sample) generate robsmpl=0 replace robsmpl=1 if smpl==1 & panelv==1 predict xb if smpl==1, xb generate reshat=`depv'-normal(xb) if smpl==1 generate phis=normalden(xb) if smpl==1 generate phil=normal(xb) if smpl==1 generate denominator=sqrt(phil*(1-phil)) if smpl==1 foreach var in `rhsvarswage' { generate `var's=`var'*phis/denominator if smpl==1 by idcode: egen `var'm=mean(`var's) if smpl==1 by idcode: egen `var't=count(`var's) if smpl==1 generate `var'sm = /// sqrt(1/(1-rhore))*`var's-(sqrt(1/(1-rhore))-sqrt(1/(1+(`var't-1)*rhore)))*`var'm /// if smpl==1 } generate res=reshat/denominator if smpl==1 by idcode: egen resm=mean(res) if smpl==1 by idcode: egen rest=count(res) if smpl==1 generate ressm = /// sqrt(1/(1-rhore))*res-(sqrt(1/(1-rhore))-sqrt(1/(1+(rest-1)*rhore)))*`var'm /// if smpl==1 local scorerob="" local transvar="" foreach var in `rhsvarswage' { generate score_`var'=`var'sm*ressm if smpl==1 by idcode: egen score_sum_`var'=sum(score_`var') if smpl==1 local scorerob="`scorerob' score_sum_`var'" local transvar="`transvar' `var'sm" } mata st_view(tvar=., ., tokens("`transvar'"), "smpl") st_view(srob=., ., tokens("`scorerob'"), "robsmpl") st_view(res=., ., tokens("res"), "smpl") invA=cholinv(cross(tvar,tvar)) B=cross(srob,srob) n=rows(tvar) k=cols(tvar) ncl=rows(srob) s2=(res'res)/(n-k) st_matrix("invA",invA) st_matrix("B",B) st_matrix("s2",s2) st_matrix("n",n) st_matrix("ncl",ncl) st_matrix("k",k) end scalar sig_e2=s2[1,1] scalar Nobs=n[1,1] scalar Nclust=ncl[1,1] scalar Kvar=k[1,1] mat V_comp=invA mat Vrob_comp=invA*B*invA*((Nclust/(Nclust-1))*((Nobs-1)/(Nobs-Kvar))) xtprobit `depv' `specwage', re mat list e(V) mat list V_comp estimates restore eqwage mat list e(V) mat list Vrob_comp
Code:
. mat list e(V) symmetric e(V)[4,4] has_work: has_work: has_work: lnsig2u: grade ttl_exp _cons _cons has_work:grade .00007097 has_work:ttl_exp -.00001 .00003009 has_work:_cons -.00081693 -6.243e-06 .01026574 lnsig2u:_cons 7.904e-06 .00001659 .00081449 220.88474 . mat list V_comp symmetric V_comp[3,3] c1 c2 c3 r1 .00007175 r2 -.00001034 .00002998 r3 -.00082485 -1.678e-06 .01034183 . estimates restore eqwage (results eqwage are active now) . mat list e(V) symmetric e(V)[4,4] has_work: has_work: has_work: lnsig2u: grade ttl_exp _cons _cons has_work:grade .00007894 has_work:ttl_exp -.00001217 .00002886 has_work:_cons -.00085791 .00002902 .01132769 lnsig2u:_cons 11.523521 .16033801 142.30662 61293428 . mat list Vrob_comp symmetric Vrob_comp[3,3] c1 c2 c3 r1 .00007994 r2 -.00001359 .00002898 r3 -.00091537 .00004411 .01128811
Comment