How to bootstrap?

River Huang

Join Date: Mar 2016
Posts: 1908

20 Aug 2017, 03:10

Dear All, I follow the suggestion of Malikov, E., Kumbhakar, S.C., 2014. A generalized panel data switching regression model. Econom. Lett. 124 (3), 353–357 to write a Stata program to estimate current account (`ca' in the second-stage outcome equation below) dynamics across alternative exchange rate regimes (err=1 if fixed; err=2 if intermediate; and err=3 if flexible). The illustrative data is

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 code float(year err lgdp openness pcgdpg resimp ca ca0)
"ATG" 2009 1 20.934256 104.13732 -13.072355 1.9915867 -13.880438 -26.414835
"AUS" 2009 3  27.74471  44.91765  -.2649566 1.8265275  -5.273952  -4.938875
"AUS" 2010 3  27.76457  39.83852   .4314655 1.4770035  -3.912429  -5.273952
"BLR" 2009 1  24.69533 112.31034   .4236559 2.1676178 -12.462441  -8.162176
"BLR" 2010 2  24.77038 115.91798    7.97749   1.57038  -14.46763 -12.462441
"CMR" 2009 1   23.8533 37.065178   -.824791   6.44656  -4.784514 -1.9283733
"CMR" 2010 1 23.885466    40.361   .4868559  6.521112  -3.624956  -4.784514
"COL" 2009 2  26.34386 34.280003   .4996768  6.073253 -1.9882672  -2.647993
"COL" 2010 3  26.38281 33.700848   2.835318  5.554166  -3.018355 -1.9882672
"COM" 2009 1 20.067556  63.20974  -.4763498  7.018032  -7.483857  -13.13504
"COM" 2010 1  20.08932  68.16081 -.23611353  6.364758  -7.407689  -7.483857
"CPV" 2009 2 21.218115  88.03723  -2.311435  4.155446 -14.417443 -11.483542
"CPV" 2010 2  21.23268  94.43835   .3719076  3.800331 -13.388605 -14.417443
"CRI" 2009 1 24.293087  70.17782 -2.2669241 4.0754414   -1.83422  -8.429599
"CRI" 2010 1  24.34142 68.218575  3.6353245 3.8593636 -3.2570934   -1.83422
"CYP" 2009 1  23.95129 102.80183  -4.374274  .5983004   -7.65719  -15.16893
"CYP" 2010 1  23.96438 107.69087 -1.2976004  .4897927 -11.368127   -7.65719
"CZE" 2009 2 26.033373 113.74112  -5.382388  3.785846 -2.3670702  -1.874091
"CZE" 2010 2 26.056065 129.25456  1.9974748  3.401115  -3.551025 -2.3670702
"DEU" 2009 1  28.81982  70.66505  -5.379411  1.646514   5.818374   5.620281
"DEU" 2010 1  28.85981  79.30308  4.2395043 1.7712165   5.649066   5.818374
"DMA" 2009 1  20.01099  84.32565 -1.3848196   3.18094 -22.712105  -28.34747
"DMA" 2010 1  20.01769  89.13377   .3752036 3.2608254 -16.237488 -22.712105
"DNK" 2009 1 26.479265  89.75504  -5.413992  5.862818   3.351693   2.830052
"DNK" 2010 1   26.4978  94.09998   1.419488  5.765757    5.64686   3.351693
"DOM" 2009 2 24.631516  50.61228  -.4101409  3.071752 -4.7603636 -9.3596945
"DOM" 2010 2  24.71141  55.90717   6.891949 2.3004398  -7.457199 -4.7603636
"DZA" 2009 2  25.77026 71.324326 -.10159976 33.791435   .3145995   19.85624
"DZA" 2010 2 25.805956  69.86666   1.763682  36.78194    7.58047   .3145995
"ECU" 2009 1  24.93074  52.10485 -1.1018021  2.478199  .49095315   2.856761
"ECU" 2010 1  24.96539  60.30324   1.837978 1.3248158 -2.2804258  .49095315
"EGY" 2009 2 26.061657  56.55344  2.7557354  7.358274 -1.7722816  -.8688219
"EGY" 2010 2  26.11183  47.93635   3.091669  6.701213 -2.0575788 -1.7722816
"FRA" 2009 1  28.58492  49.56785  -3.439412 1.8636028  -.8188631  -.9640422
"FRA" 2010 1  28.60439  53.96844   1.463151 2.2021236  -.8324761  -.8188631
"GBR" 2009 3  28.49981  54.72441  -5.048664  .8425719  -2.903771 -3.5240715
"GBR" 2010 3  28.51878  59.22182  1.1193836  .9911548 -2.7510476  -2.903771
"GHA" 2009 2 24.118416  71.59474  2.1912985  3.783322  -7.303012 -11.664184
"GHA" 2010 2  24.19445 75.377815   5.222186  4.251202  -8.538801  -7.303012
"GMB" 2009 2  20.61132  64.61083   3.139261  7.425123   6.994719  1.1238829
"GMB" 2010 2 20.674526 66.455666   3.214035  7.088469   5.907137   6.994719
"GRC" 2009 1 26.481266  47.74385 -4.5521173  .6386505  -10.88277 -14.476298
"GRC" 2010 1  26.42492   52.8291  -5.600778  .7923775 -10.113232  -10.88277
"GTM" 2009 2 24.416767  57.10598 -1.6499767  4.415632   .7230578 -3.6126866
"GTM" 2010 2  24.44506   62.1149   .6606255 4.2811093 -1.3625484   .7230578
"HRV" 2009 2  24.82918 72.761795  -7.270236  6.506817  -5.036501  -8.816459
"HRV" 2010 2  24.81202  75.89763 -1.4498836  6.560429 -1.4990332  -5.036501
"HTI" 2009 3  22.67029  58.28431  1.5364974  4.478287  -1.855614  -3.127642
"HTI" 2010 3 22.613745  80.09118  -6.884637  5.285829 -1.5375408  -1.855614
"ISL" 2009 3 23.343874  90.38019  -7.260946  8.774795   -5.19189 -23.522587
"ISL" 2010 3 23.307627  97.13546  -3.420985 13.028338 -2.3269165   -5.19189
"ISR" 2009 2 26.123837  63.78856 -1.1231704  9.786021   3.827099  1.4677502
"ISR" 2010 2  26.17754  67.98946   3.606987  9.622063  3.3601456   3.827099
"JAM" 2009 2  23.31769  86.88398  -4.809099  3.431532  -9.365416  -20.42076
"JAM" 2010 2 23.302895  80.92348  -1.928138 4.1733613  -7.079966  -9.365416
"JPN" 2009 3 29.330437   24.4909  -5.405301 18.338163  2.7846885   2.820931
"JPN" 2010 3 29.371504  28.61301  4.1735773 15.685442   3.875161  2.7846885
"KGZ" 2009 2 22.295433 133.37915  1.6516515  5.052202 -4.3142366  -13.87637
"KGZ" 2010 2 22.290705 133.23285  -1.651753  5.020518   -9.34542 -4.3142366
"KOR" 2009 3 27.658373  90.41264  .19051726  7.796637   3.724581   .3182638
"KOR" 2010 3  27.72132  95.65408   5.967519  6.545405   2.635945   3.724581
"LBN" 2009 1  24.28452  92.74914   8.402636  14.81081 -19.183285  -14.23059
"LBN" 2010 1 24.361115  98.11693  4.1274276 16.585743 -19.868616 -19.183285
"LKA" 2009 2 24.684385  49.14914  2.7559404  5.215487  -.5103858  -9.543199
"LKA" 2010 2 24.761494  46.36389   7.205263  5.343544  -1.895136  -.5103858
"LSO" 2009 1 21.532957  148.3949  1.1759651  6.620369    2.96052  18.447823
"LSO" 2010 1  21.59627 140.08711   5.422052  5.141607  -6.613143    2.96052
"LTU" 2009 2  24.32117 105.55858 -13.863035  3.912167   2.266767 -13.721574
"LTU" 2010 2 24.337435 132.56178  3.7936525   2.95729  -.3212217   2.266767
"LVA" 2009 2  23.92979  86.82642 -12.906108   7.24893   7.904533 -12.593387
"LVA" 2010 1  23.89116 108.78899 -1.7662354  6.437962  2.0812159   7.904533
"MAR" 2009 1 25.220745   67.9151     2.9643  7.572931  -5.351425  -4.895267
"MAR" 2010 1 25.258194 75.247635   2.471154  7.350544 -4.2107983  -5.351425
"MDG" 2009 3 22.887396  73.99667  -6.686151  2.898483  -21.09993 -18.946804
"MDG" 2010 3 22.890024   68.0227  -2.498149 3.3948865 -10.168513  -21.09993
"MEX" 2009 3 27.631046  56.03479  -6.221285  4.252017  -.9737812  -1.850457
"MEX" 2010 3 27.680885  60.94653   3.485229 4.1144123  -.5010219  -.9737812
"MKD" 2009 3   22.9317  87.17699  -.4412051  5.105462  -6.483544 -12.470758
"MKD" 2010 3 22.964737  97.88107   3.276602  4.719605  -2.107913  -6.483544
"MLT" 2009 1 22.856483 296.97488 -3.1948564 .26835653  -6.538419  -.8654626
"MLT" 2010 1 22.891296  307.4218   3.035346 .28414986  -4.804269  -6.538419
"MUS" 2009 2  22.98338 104.42973  3.0411005  5.039511   -7.17475  -9.767047
"MUS" 2010 2  23.02622 113.45708   4.129199  2.761873 -10.054045   -7.17475
"NAM" 2009 1  23.08785 125.47756 -1.1432047  4.563721 -1.4762572  -.1118428
"NAM" 2010 1  23.14649  108.4135  4.2762957  3.079522  -3.461317 -1.4762572
"NGA" 2009 2 26.558756  61.80285   4.126187 8.4259205   8.182299    14.0102
"NGA" 2010 2  26.63423  42.65139   4.999833 4.7170215   3.552578   8.182299
"NIC" 2009 2  22.85015  86.99361 -4.5227156 4.0435667  -8.501336 -17.046085
"NIC" 2010 2 22.893305 100.36406   3.115519 3.9219115  -8.908936  -8.501336
"NLD" 2009 1  27.43843 118.98047 -4.2612214  .6370274   5.830153  4.1609254
"NLD" 2010 1  27.45236 135.54501   .8838761  .6746554   7.391339   5.830153
"NOR" 2009 3  26.77762 67.131226  -2.855408 4.5188375  11.690187  15.784298
"NOR" 2010 3  26.78362  68.40958  -.6435047  4.216766   11.72811  11.690187
"NPL" 2009 1  23.44898  47.07945   3.496219  6.565362   .1665825   5.845486
"NPL" 2010 1  23.49602  45.98491   3.722471  6.010521   -.797466   .1665825
"OMN" 2009 1 24.747795  85.28215  1.5510355  5.809792 -1.0355015   8.240634
"OMN" 2010 1  24.79471 106.86321  -.6589162  5.466475   8.329008 -1.0355015
"PAK" 2009 2  25.88577  32.07185   .7356375  4.146899  -2.374882  -9.204316
"PAK" 2010 2  25.90171 32.868927  -.4846558  4.718503  -.7632174  -2.374882
"PAN" 2009 1  24.03163 134.09517 -.19317342 1.7765557  -.7979394 -10.769425
end

My code is (hopefully correct)

Code:

egen id = group(code)
xtset id year

// first stage
statsby _b, by(year) saving("mlogit.dta", replace): mlogit err lgdp openness pcgdpg resimp, b(1)
merge m:1 year using "mlogit.dta" 
sort code year

gen del2 = _eq2_b_lgdp*lgdp + _eq2_b_openness*openness + _eq2_b_pcgdpg*pcgdpg + _eq2_b_resimp*resimp + _eq2_b_cons
gen del3 = _eq3_b_lgdp*lgdp + _eq3_b_openness*openness + _eq3_b_pcgdpg*pcgdpg + _eq3_b_resimp*resimp + _eq3_b_cons

gen F1 = 1/(1+exp(del2)+exp(del3))
gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
gen F3 = exp(del3)/(1+exp(del2)+exp(del3))

gen J1 = -invnormal(F1)
gen J2 = -invnormal(F2)
gen J3 = -invnormal(F3)

gen imr1 = -normalden(J1)/F1
gen imr2 = -normalden(J2)/F2
gen imr3 = -normalden(J3)/F3

// second stage
reg ca ca0 imr1 if err == 1, robust
reg ca ca0 imr2 if err == 2, robust
reg ca ca0 imr3 if err == 3, robust

However, the imr* (inverse Mills' ratio) in the second stage is estimated from the first-stage so that it will cause standard errors (of the coefficients) in the second-stage to be incorrect. As such, I'd like to perform bootstrapping procedure, and wonder if anyone can give some suggestions?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Tags: None

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

21 Aug 2017, 15:59

You'll need to bootstrap the entire process. Here's an example where results from a logistic regression fed into into mean regression. I'm not expert in bootstrap theory, so I point you to Stas Kolenikov's comment at https://www.stata.com/statalist/arch.../msg00053.html

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#3

21 Aug 2017, 18:25

Dear Steve: Thanks for your reply. In fact, I did not expect (too much) to get replies on this particular question (but I gave it a try). I understand the ideas and main procedures of bootstrapping but do not know what to implement it in Stata.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

24 Aug 2017, 19:48

Here's working code for your problem, River. It follows the same logic as the code I linked to: 1) program define a program to contain the code; 2) check it. 3) have bootstrap run it. It's necessary to drop variables created in the program; otherwise after the first time round in bootstrap, generate statements will silently fail because the variables are already present. I used preserve & restore to do this in the linked code, but I'd guess that drop is more efficient.

Steve

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 code float(year err lgdp openness pcgdpg resimp ca ca0)
"ATG" 2009 1 20.934256 104.13732 -13.072355 1.9915867 -13.880438 -26.414835
"AUS" 2009 3  27.74471  44.91765  -.2649566 1.8265275  -5.273952  -4.938875
"AUS" 2010 3  27.76457  39.83852   .4314655 1.4770035  -3.912429  -5.273952
"BLR" 2009 1  24.69533 112.31034   .4236559 2.1676178 -12.462441  -8.162176
"BLR" 2010 2  24.77038 115.91798    7.97749   1.57038  -14.46763 -12.462441
"CMR" 2009 1   23.8533 37.065178   -.824791   6.44656  -4.784514 -1.9283733
"CMR" 2010 1 23.885466    40.361   .4868559  6.521112  -3.624956  -4.784514
"COL" 2009 2  26.34386 34.280003   .4996768  6.073253 -1.9882672  -2.647993
"COL" 2010 3  26.38281 33.700848   2.835318  5.554166  -3.018355 -1.9882672
"COM" 2009 1 20.067556  63.20974  -.4763498  7.018032  -7.483857  -13.13504
"COM" 2010 1  20.08932  68.16081 -.23611353  6.364758  -7.407689  -7.483857
"CPV" 2009 2 21.218115  88.03723  -2.311435  4.155446 -14.417443 -11.483542
"CPV" 2010 2  21.23268  94.43835   .3719076  3.800331 -13.388605 -14.417443
"CRI" 2009 1 24.293087  70.17782 -2.2669241 4.0754414   -1.83422  -8.429599
"CRI" 2010 1  24.34142 68.218575  3.6353245 3.8593636 -3.2570934   -1.83422
"CYP" 2009 1  23.95129 102.80183  -4.374274  .5983004   -7.65719  -15.16893
"CYP" 2010 1  23.96438 107.69087 -1.2976004  .4897927 -11.368127   -7.65719
"CZE" 2009 2 26.033373 113.74112  -5.382388  3.785846 -2.3670702  -1.874091
"CZE" 2010 2 26.056065 129.25456  1.9974748  3.401115  -3.551025 -2.3670702
"DEU" 2009 1  28.81982  70.66505  -5.379411  1.646514   5.818374   5.620281
"DEU" 2010 1  28.85981  79.30308  4.2395043 1.7712165   5.649066   5.818374
"DMA" 2009 1  20.01099  84.32565 -1.3848196   3.18094 -22.712105  -28.34747
"DMA" 2010 1  20.01769  89.13377   .3752036 3.2608254 -16.237488 -22.712105
"DNK" 2009 1 26.479265  89.75504  -5.413992  5.862818   3.351693   2.830052
"DNK" 2010 1   26.4978  94.09998   1.419488  5.765757    5.64686   3.351693
"DOM" 2009 2 24.631516  50.61228  -.4101409  3.071752 -4.7603636 -9.3596945
"DOM" 2010 2  24.71141  55.90717   6.891949 2.3004398  -7.457199 -4.7603636
"DZA" 2009 2  25.77026 71.324326 -.10159976 33.791435   .3145995   19.85624
"DZA" 2010 2 25.805956  69.86666   1.763682  36.78194    7.58047   .3145995
"ECU" 2009 1  24.93074  52.10485 -1.1018021  2.478199  .49095315   2.856761
"ECU" 2010 1  24.96539  60.30324   1.837978 1.3248158 -2.2804258  .49095315
"EGY" 2009 2 26.061657  56.55344  2.7557354  7.358274 -1.7722816  -.8688219
"EGY" 2010 2  26.11183  47.93635   3.091669  6.701213 -2.0575788 -1.7722816
"FRA" 2009 1  28.58492  49.56785  -3.439412 1.8636028  -.8188631  -.9640422
"FRA" 2010 1  28.60439  53.96844   1.463151 2.2021236  -.8324761  -.8188631
"GBR" 2009 3  28.49981  54.72441  -5.048664  .8425719  -2.903771 -3.5240715
"GBR" 2010 3  28.51878  59.22182  1.1193836  .9911548 -2.7510476  -2.903771
"GHA" 2009 2 24.118416  71.59474  2.1912985  3.783322  -7.303012 -11.664184
"GHA" 2010 2  24.19445 75.377815   5.222186  4.251202  -8.538801  -7.303012
"GMB" 2009 2  20.61132  64.61083   3.139261  7.425123   6.994719  1.1238829
"GMB" 2010 2 20.674526 66.455666   3.214035  7.088469   5.907137   6.994719
"GRC" 2009 1 26.481266  47.74385 -4.5521173  .6386505  -10.88277 -14.476298
"GRC" 2010 1  26.42492   52.8291  -5.600778  .7923775 -10.113232  -10.88277
"GTM" 2009 2 24.416767  57.10598 -1.6499767  4.415632   .7230578 -3.6126866
"GTM" 2010 2  24.44506   62.1149   .6606255 4.2811093 -1.3625484   .7230578
"HRV" 2009 2  24.82918 72.761795  -7.270236  6.506817  -5.036501  -8.816459
"HRV" 2010 2  24.81202  75.89763 -1.4498836  6.560429 -1.4990332  -5.036501
"HTI" 2009 3  22.67029  58.28431  1.5364974  4.478287  -1.855614  -3.127642
"HTI" 2010 3 22.613745  80.09118  -6.884637  5.285829 -1.5375408  -1.855614
"ISL" 2009 3 23.343874  90.38019  -7.260946  8.774795   -5.19189 -23.522587
"ISL" 2010 3 23.307627  97.13546  -3.420985 13.028338 -2.3269165   -5.19189
"ISR" 2009 2 26.123837  63.78856 -1.1231704  9.786021   3.827099  1.4677502
"ISR" 2010 2  26.17754  67.98946   3.606987  9.622063  3.3601456   3.827099
"JAM" 2009 2  23.31769  86.88398  -4.809099  3.431532  -9.365416  -20.42076
"JAM" 2010 2 23.302895  80.92348  -1.928138 4.1733613  -7.079966  -9.365416
"JPN" 2009 3 29.330437   24.4909  -5.405301 18.338163  2.7846885   2.820931
"JPN" 2010 3 29.371504  28.61301  4.1735773 15.685442   3.875161  2.7846885
"KGZ" 2009 2 22.295433 133.37915  1.6516515  5.052202 -4.3142366  -13.87637
"KGZ" 2010 2 22.290705 133.23285  -1.651753  5.020518   -9.34542 -4.3142366
"KOR" 2009 3 27.658373  90.41264  .19051726  7.796637   3.724581   .3182638
"KOR" 2010 3  27.72132  95.65408   5.967519  6.545405   2.635945   3.724581
"LBN" 2009 1  24.28452  92.74914   8.402636  14.81081 -19.183285  -14.23059
"LBN" 2010 1 24.361115  98.11693  4.1274276 16.585743 -19.868616 -19.183285
"LKA" 2009 2 24.684385  49.14914  2.7559404  5.215487  -.5103858  -9.543199
"LKA" 2010 2 24.761494  46.36389   7.205263  5.343544  -1.895136  -.5103858
"LSO" 2009 1 21.532957  148.3949  1.1759651  6.620369    2.96052  18.447823
"LSO" 2010 1  21.59627 140.08711   5.422052  5.141607  -6.613143    2.96052
"LTU" 2009 2  24.32117 105.55858 -13.863035  3.912167   2.266767 -13.721574
"LTU" 2010 2 24.337435 132.56178  3.7936525   2.95729  -.3212217   2.266767
"LVA" 2009 2  23.92979  86.82642 -12.906108   7.24893   7.904533 -12.593387
"LVA" 2010 1  23.89116 108.78899 -1.7662354  6.437962  2.0812159   7.904533
"MAR" 2009 1 25.220745   67.9151     2.9643  7.572931  -5.351425  -4.895267
"MAR" 2010 1 25.258194 75.247635   2.471154  7.350544 -4.2107983  -5.351425
"MDG" 2009 3 22.887396  73.99667  -6.686151  2.898483  -21.09993 -18.946804
"MDG" 2010 3 22.890024   68.0227  -2.498149 3.3948865 -10.168513  -21.09993
"MEX" 2009 3 27.631046  56.03479  -6.221285  4.252017  -.9737812  -1.850457
"MEX" 2010 3 27.680885  60.94653   3.485229 4.1144123  -.5010219  -.9737812
"MKD" 2009 3   22.9317  87.17699  -.4412051  5.105462  -6.483544 -12.470758
"MKD" 2010 3 22.964737  97.88107   3.276602  4.719605  -2.107913  -6.483544
"MLT" 2009 1 22.856483 296.97488 -3.1948564 .26835653  -6.538419  -.8654626
"MLT" 2010 1 22.891296  307.4218   3.035346 .28414986  -4.804269  -6.538419
"MUS" 2009 2  22.98338 104.42973  3.0411005  5.039511   -7.17475  -9.767047
"MUS" 2010 2  23.02622 113.45708   4.129199  2.761873 -10.054045   -7.17475
"NAM" 2009 1  23.08785 125.47756 -1.1432047  4.563721 -1.4762572  -.1118428
"NAM" 2010 1  23.14649  108.4135  4.2762957  3.079522  -3.461317 -1.4762572
"NGA" 2009 2 26.558756  61.80285   4.126187 8.4259205   8.182299    14.0102
"NGA" 2010 2  26.63423  42.65139   4.999833 4.7170215   3.552578   8.182299
"NIC" 2009 2  22.85015  86.99361 -4.5227156 4.0435667  -8.501336 -17.046085
"NIC" 2010 2 22.893305 100.36406   3.115519 3.9219115  -8.908936  -8.501336
"NLD" 2009 1  27.43843 118.98047 -4.2612214  .6370274   5.830153  4.1609254
"NLD" 2010 1  27.45236 135.54501   .8838761  .6746554   7.391339   5.830153
"NOR" 2009 3  26.77762 67.131226  -2.855408 4.5188375  11.690187  15.784298
"NOR" 2010 3  26.78362  68.40958  -.6435047  4.216766   11.72811  11.690187
"NPL" 2009 1  23.44898  47.07945   3.496219  6.565362   .1665825   5.845486
"NPL" 2010 1  23.49602  45.98491   3.722471  6.010521   -.797466   .1665825
"OMN" 2009 1 24.747795  85.28215  1.5510355  5.809792 -1.0355015   8.240634
"OMN" 2010 1  24.79471 106.86321  -.6589162  5.466475   8.329008 -1.0355015
"PAK" 2009 2  25.88577  32.07185   .7356375  4.146899  -2.374882  -9.204316
"PAK" 2010 2  25.90171 32.868927  -.4846558  4.718503  -.7632174  -2.374882
"PAN" 2009 1  24.03163 134.09517 -.19317342 1.7765557  -.7979394 -10.769425
end


egen id = group(code)
xtset id year
save d1, replace

cap program drop _all
scalar drop _all
macro drop _all

/* Convert CODE to PROGRAM  */
program define twostep, rclass
    tempfile mlogit
    // first stage
    statsby _b, by(year) saving(`mlogit', replace): mlogit err lgdp openness pcgdpg resimp, b(1)
    merge m:1 year using `mlogit'
    sort code year

    gen del2 = _eq2_b_lgdp*lgdp + _eq2_b_openness*openness + _eq2_b_pcgdpg*pcgdpg + _eq2_b_resimp*resimp + _eq2_b_cons
    gen del3 = _eq3_b_lgdp*lgdp + _eq3_b_openness*openness + _eq3_b_pcgdpg*pcgdpg + _eq3_b_resimp*resimp + _eq3_b_cons

    gen F1 = 1/(1+exp(del2)+exp(del3))
    gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
    gen F3 = exp(del3)/(1+exp(del2)+exp(del3))

    gen J1 = -invnormal(F1)
    gen J2 = -invnormal(F2)
    gen J3 = -invnormal(F3)

    gen imr1 = -normalden(J1)/F1
    gen imr2 = -normalden(J2)/F2
    gen imr3 = -normalden(J3)/F3
    // second stage
    tempfile t1
    save `t1'
    forvalues i = 1/3{
        use `t1', clear
        regress ca ca0 imr1 if err==`i'
        local xca0 =  _b[ca0]
        return scalar ca0_`i' = `xca0'
        local ximr1 =  _b[imr1]
        return scalar imr1_`i' = `ximr1'
        /* Now must drop variables created in the program
           or generate will fail silently in replicates
           because variables already exist */
        }
     drop _merge del2 del3 F1 F2 F3 J1 J2 J3 imr1 imr2 imr3
end

use d1, clear
/* Check Program twostep */
twostep
return list  

/* Bootstrap program twostep */
bootstrap ca0_1 = r(ca0_1) imr1_1 = r(imr1_1)  ///
          ca0_2 = r(ca0_2) imr1_2 = r(imr1_2)  ///
          ca0_3 = r(ca0_3) imr1_3 = r(imr1_3), ///
          nodrop reps(10): twostep

estat bootstrap, all
log close

Last edited by Steve Samuels; 24 Aug 2017, 19:56.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#5

24 Aug 2017, 19:57

Dear Steve, Wonderful! I can not thank enough for your time, effort, and expertise. I have tried and found it works well.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

25 Aug 2017, 13:11

You're very welcome, River. I see that I missed imr2 and imr3 in your regression analyses for err=2 and err=3. Easily fixed as I'm sure you saw.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

River Huang

Join Date: Mar 2016
Posts: 1908

25 Aug 2017, 18:50

Originally posted by Steve Samuels View Post

You're very welcome, River. I see that I missed imr2 and imr3 in your regression analyses for err=2 and err=3. Easily fixed as I'm sure you saw.

Are you saying

Code:

// second stage
    tempfile t1
    save `t1'
    forvalues i = 1/3 {
        use `t1', clear
        regress ca ca0 imr`i' if err==`i'
        local xca0 =  _b[ca0]
        return scalar ca0_`i' = `xca0'
        local ximr =  _b[imr`i']
        return scalar imr_`i' = `ximr'
        local xcons =  _b[_cons]
        return scalar cons_`i' = `xcons'
        /* Now must drop variables created in the program
           or generate will fail silently in replicates
           because variables already exist */
        }
     drop _merge del2 del3 F1 F2 F3 J1 J2 J3 imr1 imr2 imr3

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#8

27 Aug 2017, 07:39

My regress statement used imr1 and your change is correct.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#9

27 Aug 2017, 17:35

Originally posted by Steve Samuels View Post

My regress statement used imr1 and your change is correct.

Thanks again, Steve.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#10

28 Aug 2017, 13:21

River asked in another thread if there was a way to speed up this bootstrapping exercise using rangerun. I've have never used bootstrap and most of this is over my head but here is what I managed to figure out. First, this is a more concise version of the code proposed by Steve in #4 with my guess to address the issue in #6 and #7:

Code:

clear all

program define twostep, rclass
    tempfile mlogit
    statsby _b, by(year) saving(`mlogit', replace): mlogit err lgdp openness pcgdpg resimp, b(1)
    merge m:1 year using `mlogit'
    sort code year

    gen del2 = _eq2_b_lgdp*lgdp + _eq2_b_openness*openness + _eq2_b_pcgdpg*pcgdpg + _eq2_b_resimp*resimp + _eq2_b_cons
    gen del3 = _eq3_b_lgdp*lgdp + _eq3_b_openness*openness + _eq3_b_pcgdpg*pcgdpg + _eq3_b_resimp*resimp + _eq3_b_cons

    gen F1 = 1/(1+exp(del2)+exp(del3))
    gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
    gen F3 = exp(del3)/(1+exp(del2)+exp(del3))

    gen J1 = -invnormal(F1)
    gen J2 = -invnormal(F2)
    gen J3 = -invnormal(F3)

    gen imr1 = -normalden(J1)/F1
    gen imr2 = -normalden(J2)/F2
    gen imr3 = -normalden(J3)/F3

    forvalues i = 1/3 {
        regress ca ca0 imr`i' if err==`i'
        return scalar ca0_`i' = _b[ca0]
        return scalar imr_`i' = _b[imr`i']
    }
     drop _merge del2 del3 F1 F2 F3 J1 J2 J3 imr1 imr2 imr3
end

set seed 213
use d1, clear
bootstrap ca0_1 = r(ca0_1) imr_1 = r(imr_1)  ///
          ca0_2 = r(ca0_2) imr_2 = r(imr_2)  ///
          ca0_3 = r(ca0_3) imr_3 = r(imr_3), ///
          nodrop reps(10): twostep

and the results if run using the dataset defined in #4:

Code:

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       ca0_1 |    .635022   .1003064     6.33   0.000      .438425    .8316191
       imr_1 |   1.140804   2.076781     0.55   0.583    -2.929612     5.21122
       ca0_2 |   .4603459   .0740344     6.22   0.000     .3152412    .6054506
       imr_2 |    -.82331   3.327433    -0.25   0.805    -7.344959    5.698339
       ca0_3 |   .6470341   .2107912     3.07   0.002     .2338908    1.060177
       imr_3 |  -.5733437   1.284544    -0.45   0.655    -3.091003    1.944316
------------------------------------------------------------------------------

When you call rangerun, a virtual copy of the data in memory stored. rangerun then loops over each observation and replaces the data in memory with observations that fall within the interval bounds defined for the current interval. The user's program is then called and results are collected from the new variables that have been created, using the values from the last observation in memory. If the interval is invalid, the program is not called and no results are stored for that observation. For each repetition, you will want to replace the data in memory with the full set of observations so I created a variable called all that is set to 1 for all observations. For the first 11 observations, I generate a valid interval; the upper bound is set to -1 for the remaining observations. This means that rangerun will call the both_stages program 11 times. For the first observation, the program will run on the actual data. For the subsequent observations, bsample is called to replace the data with a bootstrap sample.

I think that you can replicate most of the functionality using rangerun as follows:

Code:

clear all

* define a program to perform the first stage by year
program first_stage
    mlogit err lgdp openness pcgdpg resimp, b(1)
    foreach eqno in 2 3 {
        foreach v of varlist lgdp openness pcgdpg resimp {
            gen double _eq`eqno'_b_`v' = [`eqno']_b[`v']
        }
        gen double _eq`eqno'_b_cons = [`eqno']_b[_cons]
    }
    keep _eq*
end

* define a program that handles both stages
program both_stages
    
    if rr_obs > 1 bsample
    
    // first stage, use only last obs per year
    bysort year (id): gen high = cond(_n == _N, year, -1)
    rangerun first_stage, interval(year year high)
    
    by year: gen del2 = _eq2_b_lgdp[_N]*lgdp + _eq2_b_openness[_N]*openness + _eq2_b_pcgdpg[_N]*pcgdpg + _eq2_b_resimp[_N]*resimp + _eq2_b_cons[_N]
    by year: gen del3 = _eq3_b_lgdp[_N]*lgdp + _eq3_b_openness[_N]*openness + _eq3_b_pcgdpg[_N]*pcgdpg + _eq3_b_resimp[_N]*resimp + _eq3_b_cons[_N]

    gen F1 = 1/(1+exp(del2)+exp(del3))
    gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
    gen F3 = exp(del3)/(1+exp(del2)+exp(del3))

    gen J1 = -invnormal(F1)
    gen J2 = -invnormal(F2)
    gen J3 = -invnormal(F3)

    gen imr1 = -normalden(J1)/F1
    gen imr2 = -normalden(J2)/F2
    gen imr3 = -normalden(J3)/F3
    
    // second stage
    forvalues i = 1/3 {
        regress ca ca0 imr`i' if err==`i'
        gen ca0_`i' = _b[ca0]
        gen imr_`i' = _b[imr`i'] 
    }
    keep ca0_* imr_*
end

* set the seed and load the data
set seed 213
use d1
gen long obs = _n

* include all observations 
gen byte all = 1
gen high2use = cond(_n <= 11, 1, -1)

rangerun both_stages, interval(all all high2use) sprefix(rr_)

list ca0_* imr_*  in 1
sum ca0_* imr_* if _n > 1

and the results:

Code:

. list ca0_* imr_*  in 1

     +-----------------------------------------------------------------+
     |   ca0_1      ca0_2      ca0_3      imr_1      imr_2       imr_3 |
     |-----------------------------------------------------------------|
  1. | .635022   .4603459   .6470341   1.140804   -.823312   -.5733436 |
     +-----------------------------------------------------------------+

. sum ca0_* imr_* if _n > 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       ca0_1 |         10    .7032151    .1027347   .5520792   .8427294
       ca0_2 |         10     .461249    .0879653   .3385713   .6135879
       ca0_3 |         10    .7074627    .2095903   .4477511   1.038585
       imr_1 |         10   -1.410428     2.54176  -6.540113   2.274867
       imr_2 |         10     2.29251    5.010887  -4.772506   10.55478
-------------+---------------------------------------------------------
       imr_3 |         10   -.7564784    1.547175  -3.083577   1.576656

So my code matches the "Observed Coef." generated by bootstrap. I do not know how bootstrap generates the rest of what is reported but when I sum the results generated, the standard errors are close but definitively not the same.

That's as far as I can push this. rangerun should be much faster than bootstrap because of how efficiently is stores results, particularly if there are lots of repetitions. Note that using the technique as coded, you cannot do more repetitions than there are observations in the data. This can easily be fixed by adding missing observations at the end.

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#11

28 Aug 2017, 19:01

Dear Robert, Thank you so much. I am going to do some experiments to see the performance of the new procedure.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#12

28 Aug 2017, 21:24

Let me reiterate that I'm not sure how bootstrap does its thing so it's not because my code runs that the results necessarily make sense. Perhaps a less radical solution is to just replace the statsby call with the equivalent functionality using rangerun. The results are the same on the original sample (the Observed Coef. are the same) but the rest is different. My guess is that this is because doing the first stage differently changes the sequence of random numbers generated which means that repeated bsample calls generate different samples.

Code:

clear all

* define a program to perform the first stage by year
program first_stage
    mlogit err lgdp openness pcgdpg resimp, b(1)
    foreach eqno in 2 3 {
        foreach v of varlist lgdp openness pcgdpg resimp {
            gen double _eq`eqno'_b_`v' = [`eqno']_b[`v']
        }
        gen double _eq`eqno'_b_cons = [`eqno']_b[_cons]
    }
    keep _eq*
end

* define a program that handles both stages
program both_stages, rclass

    // first stage, use only last obs per year
    bysort year: gen high = cond(_n == _N, year, -1)
    rangerun first_stage, interval(year year high)

    by year: gen del2 = _eq2_b_lgdp[_N]*lgdp + _eq2_b_openness[_N]*openness + _eq2_b_pcgdpg[_N]*pcgdpg + _eq2_b_resimp[_N]*resimp + _eq2_b_cons[_N]
    by year: gen del3 = _eq3_b_lgdp[_N]*lgdp + _eq3_b_openness[_N]*openness + _eq3_b_pcgdpg[_N]*pcgdpg + _eq3_b_resimp[_N]*resimp + _eq3_b_cons[_N]

    drop high _eq*
    sort code year

    gen F1 = 1/(1+exp(del2)+exp(del3))
    gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
    gen F3 = exp(del3)/(1+exp(del2)+exp(del3))

    gen J1 = -invnormal(F1)
    gen J2 = -invnormal(F2)
    gen J3 = -invnormal(F3)

    gen imr1 = -normalden(J1)/F1
    gen imr2 = -normalden(J2)/F2
    gen imr3 = -normalden(J3)/F3
    
    // second stage
    forvalues i = 1/3 {
        regress ca ca0 imr`i' if err==`i'
        return scalar ca0_`i' = _b[ca0]
        return scalar imr_`i' = _b[imr`i'] 
    }
    
    drop del2 del3 F1 F2 F3 J1 J2 J3 imr1 imr2 imr3
end

* set the seed and load the data
set seed 213
use d1, clear
bootstrap ca0_1 = r(ca0_1) imr_1 = r(imr_1)  ///
          ca0_2 = r(ca0_2) imr_2 = r(imr_2)  ///
          ca0_3 = r(ca0_3) imr_3 = r(imr_3), ///
          nodrop reps(10): both_stages

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#13

29 Aug 2017, 01:30

Dear Robert, Thanks again. I will give it a try to see what happens.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

River Huang

Join Date: Mar 2016
Posts: 1908

#14

29 Aug 2017, 18:47

Dear Robert, May I ask another question? Following above procedures (given three models/regimes, i=1,2,3), I want to do some tests such as

Code:

// second stage
    forvalues i = 1/3 {
        regress ca ca0 imr`i' if err==`i'
        est store m`i'
        return scalar ca0_`i' = _b[ca0]
        return scalar imr_`i' = _b[imr`i'] 
        return scalar cons_`i' = _b[_cons]
    }
    
    suest m1 m2 m3
    test [m1_mean]imr1 = [m2_mean]imr2 = [m3_mean]imr3 = 0
    return scalar chi2_imr = r(chi2)
    test [m1_mean]ca0 = [m2_mean]ca0 = [m3_mean]ca0 
    return scalar chi2_123 = r(chi2)
    test [m1_mean]ca0 = [m2_mean]ca0 
    return scalar chi2_12 = r(chi2)
    test [m1_mean]ca0 = [m3_mean]ca0
    return scalar chi2_13 = r(chi2) 
    test [m2_mean]ca0 = [m3_mean]ca0
    return scalar chi2_23 = r(chi2)

Can I just put them right below the second stage?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Comment

Robert Picard

Join Date: Mar 2014

Posts: 1536
#15

30 Aug 2017, 09:30

I don't know, I just do data management.
Comment

Announcement

How to bootstrap?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment