Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to bootstrap?

    Dear All, I follow the suggestion of Malikov, E., Kumbhakar, S.C., 2014. A generalized panel data switching regression model. Econom. Lett. 124 (3), 353–357 to write a Stata program to estimate current account (`ca' in the second-stage outcome equation below) dynamics across alternative exchange rate regimes (err=1 if fixed; err=2 if intermediate; and err=3 if flexible). The illustrative data is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3 code float(year err lgdp openness pcgdpg resimp ca ca0)
    "ATG" 2009 1 20.934256 104.13732 -13.072355 1.9915867 -13.880438 -26.414835
    "AUS" 2009 3  27.74471  44.91765  -.2649566 1.8265275  -5.273952  -4.938875
    "AUS" 2010 3  27.76457  39.83852   .4314655 1.4770035  -3.912429  -5.273952
    "BLR" 2009 1  24.69533 112.31034   .4236559 2.1676178 -12.462441  -8.162176
    "BLR" 2010 2  24.77038 115.91798    7.97749   1.57038  -14.46763 -12.462441
    "CMR" 2009 1   23.8533 37.065178   -.824791   6.44656  -4.784514 -1.9283733
    "CMR" 2010 1 23.885466    40.361   .4868559  6.521112  -3.624956  -4.784514
    "COL" 2009 2  26.34386 34.280003   .4996768  6.073253 -1.9882672  -2.647993
    "COL" 2010 3  26.38281 33.700848   2.835318  5.554166  -3.018355 -1.9882672
    "COM" 2009 1 20.067556  63.20974  -.4763498  7.018032  -7.483857  -13.13504
    "COM" 2010 1  20.08932  68.16081 -.23611353  6.364758  -7.407689  -7.483857
    "CPV" 2009 2 21.218115  88.03723  -2.311435  4.155446 -14.417443 -11.483542
    "CPV" 2010 2  21.23268  94.43835   .3719076  3.800331 -13.388605 -14.417443
    "CRI" 2009 1 24.293087  70.17782 -2.2669241 4.0754414   -1.83422  -8.429599
    "CRI" 2010 1  24.34142 68.218575  3.6353245 3.8593636 -3.2570934   -1.83422
    "CYP" 2009 1  23.95129 102.80183  -4.374274  .5983004   -7.65719  -15.16893
    "CYP" 2010 1  23.96438 107.69087 -1.2976004  .4897927 -11.368127   -7.65719
    "CZE" 2009 2 26.033373 113.74112  -5.382388  3.785846 -2.3670702  -1.874091
    "CZE" 2010 2 26.056065 129.25456  1.9974748  3.401115  -3.551025 -2.3670702
    "DEU" 2009 1  28.81982  70.66505  -5.379411  1.646514   5.818374   5.620281
    "DEU" 2010 1  28.85981  79.30308  4.2395043 1.7712165   5.649066   5.818374
    "DMA" 2009 1  20.01099  84.32565 -1.3848196   3.18094 -22.712105  -28.34747
    "DMA" 2010 1  20.01769  89.13377   .3752036 3.2608254 -16.237488 -22.712105
    "DNK" 2009 1 26.479265  89.75504  -5.413992  5.862818   3.351693   2.830052
    "DNK" 2010 1   26.4978  94.09998   1.419488  5.765757    5.64686   3.351693
    "DOM" 2009 2 24.631516  50.61228  -.4101409  3.071752 -4.7603636 -9.3596945
    "DOM" 2010 2  24.71141  55.90717   6.891949 2.3004398  -7.457199 -4.7603636
    "DZA" 2009 2  25.77026 71.324326 -.10159976 33.791435   .3145995   19.85624
    "DZA" 2010 2 25.805956  69.86666   1.763682  36.78194    7.58047   .3145995
    "ECU" 2009 1  24.93074  52.10485 -1.1018021  2.478199  .49095315   2.856761
    "ECU" 2010 1  24.96539  60.30324   1.837978 1.3248158 -2.2804258  .49095315
    "EGY" 2009 2 26.061657  56.55344  2.7557354  7.358274 -1.7722816  -.8688219
    "EGY" 2010 2  26.11183  47.93635   3.091669  6.701213 -2.0575788 -1.7722816
    "FRA" 2009 1  28.58492  49.56785  -3.439412 1.8636028  -.8188631  -.9640422
    "FRA" 2010 1  28.60439  53.96844   1.463151 2.2021236  -.8324761  -.8188631
    "GBR" 2009 3  28.49981  54.72441  -5.048664  .8425719  -2.903771 -3.5240715
    "GBR" 2010 3  28.51878  59.22182  1.1193836  .9911548 -2.7510476  -2.903771
    "GHA" 2009 2 24.118416  71.59474  2.1912985  3.783322  -7.303012 -11.664184
    "GHA" 2010 2  24.19445 75.377815   5.222186  4.251202  -8.538801  -7.303012
    "GMB" 2009 2  20.61132  64.61083   3.139261  7.425123   6.994719  1.1238829
    "GMB" 2010 2 20.674526 66.455666   3.214035  7.088469   5.907137   6.994719
    "GRC" 2009 1 26.481266  47.74385 -4.5521173  .6386505  -10.88277 -14.476298
    "GRC" 2010 1  26.42492   52.8291  -5.600778  .7923775 -10.113232  -10.88277
    "GTM" 2009 2 24.416767  57.10598 -1.6499767  4.415632   .7230578 -3.6126866
    "GTM" 2010 2  24.44506   62.1149   .6606255 4.2811093 -1.3625484   .7230578
    "HRV" 2009 2  24.82918 72.761795  -7.270236  6.506817  -5.036501  -8.816459
    "HRV" 2010 2  24.81202  75.89763 -1.4498836  6.560429 -1.4990332  -5.036501
    "HTI" 2009 3  22.67029  58.28431  1.5364974  4.478287  -1.855614  -3.127642
    "HTI" 2010 3 22.613745  80.09118  -6.884637  5.285829 -1.5375408  -1.855614
    "ISL" 2009 3 23.343874  90.38019  -7.260946  8.774795   -5.19189 -23.522587
    "ISL" 2010 3 23.307627  97.13546  -3.420985 13.028338 -2.3269165   -5.19189
    "ISR" 2009 2 26.123837  63.78856 -1.1231704  9.786021   3.827099  1.4677502
    "ISR" 2010 2  26.17754  67.98946   3.606987  9.622063  3.3601456   3.827099
    "JAM" 2009 2  23.31769  86.88398  -4.809099  3.431532  -9.365416  -20.42076
    "JAM" 2010 2 23.302895  80.92348  -1.928138 4.1733613  -7.079966  -9.365416
    "JPN" 2009 3 29.330437   24.4909  -5.405301 18.338163  2.7846885   2.820931
    "JPN" 2010 3 29.371504  28.61301  4.1735773 15.685442   3.875161  2.7846885
    "KGZ" 2009 2 22.295433 133.37915  1.6516515  5.052202 -4.3142366  -13.87637
    "KGZ" 2010 2 22.290705 133.23285  -1.651753  5.020518   -9.34542 -4.3142366
    "KOR" 2009 3 27.658373  90.41264  .19051726  7.796637   3.724581   .3182638
    "KOR" 2010 3  27.72132  95.65408   5.967519  6.545405   2.635945   3.724581
    "LBN" 2009 1  24.28452  92.74914   8.402636  14.81081 -19.183285  -14.23059
    "LBN" 2010 1 24.361115  98.11693  4.1274276 16.585743 -19.868616 -19.183285
    "LKA" 2009 2 24.684385  49.14914  2.7559404  5.215487  -.5103858  -9.543199
    "LKA" 2010 2 24.761494  46.36389   7.205263  5.343544  -1.895136  -.5103858
    "LSO" 2009 1 21.532957  148.3949  1.1759651  6.620369    2.96052  18.447823
    "LSO" 2010 1  21.59627 140.08711   5.422052  5.141607  -6.613143    2.96052
    "LTU" 2009 2  24.32117 105.55858 -13.863035  3.912167   2.266767 -13.721574
    "LTU" 2010 2 24.337435 132.56178  3.7936525   2.95729  -.3212217   2.266767
    "LVA" 2009 2  23.92979  86.82642 -12.906108   7.24893   7.904533 -12.593387
    "LVA" 2010 1  23.89116 108.78899 -1.7662354  6.437962  2.0812159   7.904533
    "MAR" 2009 1 25.220745   67.9151     2.9643  7.572931  -5.351425  -4.895267
    "MAR" 2010 1 25.258194 75.247635   2.471154  7.350544 -4.2107983  -5.351425
    "MDG" 2009 3 22.887396  73.99667  -6.686151  2.898483  -21.09993 -18.946804
    "MDG" 2010 3 22.890024   68.0227  -2.498149 3.3948865 -10.168513  -21.09993
    "MEX" 2009 3 27.631046  56.03479  -6.221285  4.252017  -.9737812  -1.850457
    "MEX" 2010 3 27.680885  60.94653   3.485229 4.1144123  -.5010219  -.9737812
    "MKD" 2009 3   22.9317  87.17699  -.4412051  5.105462  -6.483544 -12.470758
    "MKD" 2010 3 22.964737  97.88107   3.276602  4.719605  -2.107913  -6.483544
    "MLT" 2009 1 22.856483 296.97488 -3.1948564 .26835653  -6.538419  -.8654626
    "MLT" 2010 1 22.891296  307.4218   3.035346 .28414986  -4.804269  -6.538419
    "MUS" 2009 2  22.98338 104.42973  3.0411005  5.039511   -7.17475  -9.767047
    "MUS" 2010 2  23.02622 113.45708   4.129199  2.761873 -10.054045   -7.17475
    "NAM" 2009 1  23.08785 125.47756 -1.1432047  4.563721 -1.4762572  -.1118428
    "NAM" 2010 1  23.14649  108.4135  4.2762957  3.079522  -3.461317 -1.4762572
    "NGA" 2009 2 26.558756  61.80285   4.126187 8.4259205   8.182299    14.0102
    "NGA" 2010 2  26.63423  42.65139   4.999833 4.7170215   3.552578   8.182299
    "NIC" 2009 2  22.85015  86.99361 -4.5227156 4.0435667  -8.501336 -17.046085
    "NIC" 2010 2 22.893305 100.36406   3.115519 3.9219115  -8.908936  -8.501336
    "NLD" 2009 1  27.43843 118.98047 -4.2612214  .6370274   5.830153  4.1609254
    "NLD" 2010 1  27.45236 135.54501   .8838761  .6746554   7.391339   5.830153
    "NOR" 2009 3  26.77762 67.131226  -2.855408 4.5188375  11.690187  15.784298
    "NOR" 2010 3  26.78362  68.40958  -.6435047  4.216766   11.72811  11.690187
    "NPL" 2009 1  23.44898  47.07945   3.496219  6.565362   .1665825   5.845486
    "NPL" 2010 1  23.49602  45.98491   3.722471  6.010521   -.797466   .1665825
    "OMN" 2009 1 24.747795  85.28215  1.5510355  5.809792 -1.0355015   8.240634
    "OMN" 2010 1  24.79471 106.86321  -.6589162  5.466475   8.329008 -1.0355015
    "PAK" 2009 2  25.88577  32.07185   .7356375  4.146899  -2.374882  -9.204316
    "PAK" 2010 2  25.90171 32.868927  -.4846558  4.718503  -.7632174  -2.374882
    "PAN" 2009 1  24.03163 134.09517 -.19317342 1.7765557  -.7979394 -10.769425
    end
    My code is (hopefully correct)
    Code:
    egen id = group(code)
    xtset id year
    
    // first stage
    statsby _b, by(year) saving("mlogit.dta", replace): mlogit err lgdp openness pcgdpg resimp, b(1)
    merge m:1 year using "mlogit.dta" 
    sort code year
    
    gen del2 = _eq2_b_lgdp*lgdp + _eq2_b_openness*openness + _eq2_b_pcgdpg*pcgdpg + _eq2_b_resimp*resimp + _eq2_b_cons
    gen del3 = _eq3_b_lgdp*lgdp + _eq3_b_openness*openness + _eq3_b_pcgdpg*pcgdpg + _eq3_b_resimp*resimp + _eq3_b_cons
    
    gen F1 = 1/(1+exp(del2)+exp(del3))
    gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
    gen F3 = exp(del3)/(1+exp(del2)+exp(del3))
    
    gen J1 = -invnormal(F1)
    gen J2 = -invnormal(F2)
    gen J3 = -invnormal(F3)
    
    gen imr1 = -normalden(J1)/F1
    gen imr2 = -normalden(J2)/F2
    gen imr3 = -normalden(J3)/F3
    
    // second stage
    reg ca ca0 imr1 if err == 1, robust
    reg ca ca0 imr2 if err == 2, robust
    reg ca ca0 imr3 if err == 3, robust
    However, the imr* (inverse Mills' ratio) in the second stage is estimated from the first-stage so that it will cause standard errors (of the coefficients) in the second-stage to be incorrect. As such, I'd like to perform bootstrapping procedure, and wonder if anyone can give some suggestions?
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    You'll need to bootstrap the entire process. Here's an example where results from a logistic regression fed into into mean regression. I'm not expert in bootstrap theory, so I point you to Stas Kolenikov's comment at https://www.stata.com/statalist/arch.../msg00053.html

    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Dear Steve: Thanks for your reply. In fact, I did not expect (too much) to get replies on this particular question (but I gave it a try). I understand the ideas and main procedures of bootstrapping but do not know what to implement it in Stata.
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        Here's working code for your problem, River. It follows the same logic as the code I linked to: 1) program define a program to contain the code; 2) check it. 3) have bootstrap run it. It's necessary to drop variables created in the program; otherwise after the first time round in bootstrap, generate statements will silently fail because the variables are already present. I used preserve & restore to do this in the linked code, but I'd guess that drop is more efficient.

        Steve

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str3 code float(year err lgdp openness pcgdpg resimp ca ca0)
        "ATG" 2009 1 20.934256 104.13732 -13.072355 1.9915867 -13.880438 -26.414835
        "AUS" 2009 3  27.74471  44.91765  -.2649566 1.8265275  -5.273952  -4.938875
        "AUS" 2010 3  27.76457  39.83852   .4314655 1.4770035  -3.912429  -5.273952
        "BLR" 2009 1  24.69533 112.31034   .4236559 2.1676178 -12.462441  -8.162176
        "BLR" 2010 2  24.77038 115.91798    7.97749   1.57038  -14.46763 -12.462441
        "CMR" 2009 1   23.8533 37.065178   -.824791   6.44656  -4.784514 -1.9283733
        "CMR" 2010 1 23.885466    40.361   .4868559  6.521112  -3.624956  -4.784514
        "COL" 2009 2  26.34386 34.280003   .4996768  6.073253 -1.9882672  -2.647993
        "COL" 2010 3  26.38281 33.700848   2.835318  5.554166  -3.018355 -1.9882672
        "COM" 2009 1 20.067556  63.20974  -.4763498  7.018032  -7.483857  -13.13504
        "COM" 2010 1  20.08932  68.16081 -.23611353  6.364758  -7.407689  -7.483857
        "CPV" 2009 2 21.218115  88.03723  -2.311435  4.155446 -14.417443 -11.483542
        "CPV" 2010 2  21.23268  94.43835   .3719076  3.800331 -13.388605 -14.417443
        "CRI" 2009 1 24.293087  70.17782 -2.2669241 4.0754414   -1.83422  -8.429599
        "CRI" 2010 1  24.34142 68.218575  3.6353245 3.8593636 -3.2570934   -1.83422
        "CYP" 2009 1  23.95129 102.80183  -4.374274  .5983004   -7.65719  -15.16893
        "CYP" 2010 1  23.96438 107.69087 -1.2976004  .4897927 -11.368127   -7.65719
        "CZE" 2009 2 26.033373 113.74112  -5.382388  3.785846 -2.3670702  -1.874091
        "CZE" 2010 2 26.056065 129.25456  1.9974748  3.401115  -3.551025 -2.3670702
        "DEU" 2009 1  28.81982  70.66505  -5.379411  1.646514   5.818374   5.620281
        "DEU" 2010 1  28.85981  79.30308  4.2395043 1.7712165   5.649066   5.818374
        "DMA" 2009 1  20.01099  84.32565 -1.3848196   3.18094 -22.712105  -28.34747
        "DMA" 2010 1  20.01769  89.13377   .3752036 3.2608254 -16.237488 -22.712105
        "DNK" 2009 1 26.479265  89.75504  -5.413992  5.862818   3.351693   2.830052
        "DNK" 2010 1   26.4978  94.09998   1.419488  5.765757    5.64686   3.351693
        "DOM" 2009 2 24.631516  50.61228  -.4101409  3.071752 -4.7603636 -9.3596945
        "DOM" 2010 2  24.71141  55.90717   6.891949 2.3004398  -7.457199 -4.7603636
        "DZA" 2009 2  25.77026 71.324326 -.10159976 33.791435   .3145995   19.85624
        "DZA" 2010 2 25.805956  69.86666   1.763682  36.78194    7.58047   .3145995
        "ECU" 2009 1  24.93074  52.10485 -1.1018021  2.478199  .49095315   2.856761
        "ECU" 2010 1  24.96539  60.30324   1.837978 1.3248158 -2.2804258  .49095315
        "EGY" 2009 2 26.061657  56.55344  2.7557354  7.358274 -1.7722816  -.8688219
        "EGY" 2010 2  26.11183  47.93635   3.091669  6.701213 -2.0575788 -1.7722816
        "FRA" 2009 1  28.58492  49.56785  -3.439412 1.8636028  -.8188631  -.9640422
        "FRA" 2010 1  28.60439  53.96844   1.463151 2.2021236  -.8324761  -.8188631
        "GBR" 2009 3  28.49981  54.72441  -5.048664  .8425719  -2.903771 -3.5240715
        "GBR" 2010 3  28.51878  59.22182  1.1193836  .9911548 -2.7510476  -2.903771
        "GHA" 2009 2 24.118416  71.59474  2.1912985  3.783322  -7.303012 -11.664184
        "GHA" 2010 2  24.19445 75.377815   5.222186  4.251202  -8.538801  -7.303012
        "GMB" 2009 2  20.61132  64.61083   3.139261  7.425123   6.994719  1.1238829
        "GMB" 2010 2 20.674526 66.455666   3.214035  7.088469   5.907137   6.994719
        "GRC" 2009 1 26.481266  47.74385 -4.5521173  .6386505  -10.88277 -14.476298
        "GRC" 2010 1  26.42492   52.8291  -5.600778  .7923775 -10.113232  -10.88277
        "GTM" 2009 2 24.416767  57.10598 -1.6499767  4.415632   .7230578 -3.6126866
        "GTM" 2010 2  24.44506   62.1149   .6606255 4.2811093 -1.3625484   .7230578
        "HRV" 2009 2  24.82918 72.761795  -7.270236  6.506817  -5.036501  -8.816459
        "HRV" 2010 2  24.81202  75.89763 -1.4498836  6.560429 -1.4990332  -5.036501
        "HTI" 2009 3  22.67029  58.28431  1.5364974  4.478287  -1.855614  -3.127642
        "HTI" 2010 3 22.613745  80.09118  -6.884637  5.285829 -1.5375408  -1.855614
        "ISL" 2009 3 23.343874  90.38019  -7.260946  8.774795   -5.19189 -23.522587
        "ISL" 2010 3 23.307627  97.13546  -3.420985 13.028338 -2.3269165   -5.19189
        "ISR" 2009 2 26.123837  63.78856 -1.1231704  9.786021   3.827099  1.4677502
        "ISR" 2010 2  26.17754  67.98946   3.606987  9.622063  3.3601456   3.827099
        "JAM" 2009 2  23.31769  86.88398  -4.809099  3.431532  -9.365416  -20.42076
        "JAM" 2010 2 23.302895  80.92348  -1.928138 4.1733613  -7.079966  -9.365416
        "JPN" 2009 3 29.330437   24.4909  -5.405301 18.338163  2.7846885   2.820931
        "JPN" 2010 3 29.371504  28.61301  4.1735773 15.685442   3.875161  2.7846885
        "KGZ" 2009 2 22.295433 133.37915  1.6516515  5.052202 -4.3142366  -13.87637
        "KGZ" 2010 2 22.290705 133.23285  -1.651753  5.020518   -9.34542 -4.3142366
        "KOR" 2009 3 27.658373  90.41264  .19051726  7.796637   3.724581   .3182638
        "KOR" 2010 3  27.72132  95.65408   5.967519  6.545405   2.635945   3.724581
        "LBN" 2009 1  24.28452  92.74914   8.402636  14.81081 -19.183285  -14.23059
        "LBN" 2010 1 24.361115  98.11693  4.1274276 16.585743 -19.868616 -19.183285
        "LKA" 2009 2 24.684385  49.14914  2.7559404  5.215487  -.5103858  -9.543199
        "LKA" 2010 2 24.761494  46.36389   7.205263  5.343544  -1.895136  -.5103858
        "LSO" 2009 1 21.532957  148.3949  1.1759651  6.620369    2.96052  18.447823
        "LSO" 2010 1  21.59627 140.08711   5.422052  5.141607  -6.613143    2.96052
        "LTU" 2009 2  24.32117 105.55858 -13.863035  3.912167   2.266767 -13.721574
        "LTU" 2010 2 24.337435 132.56178  3.7936525   2.95729  -.3212217   2.266767
        "LVA" 2009 2  23.92979  86.82642 -12.906108   7.24893   7.904533 -12.593387
        "LVA" 2010 1  23.89116 108.78899 -1.7662354  6.437962  2.0812159   7.904533
        "MAR" 2009 1 25.220745   67.9151     2.9643  7.572931  -5.351425  -4.895267
        "MAR" 2010 1 25.258194 75.247635   2.471154  7.350544 -4.2107983  -5.351425
        "MDG" 2009 3 22.887396  73.99667  -6.686151  2.898483  -21.09993 -18.946804
        "MDG" 2010 3 22.890024   68.0227  -2.498149 3.3948865 -10.168513  -21.09993
        "MEX" 2009 3 27.631046  56.03479  -6.221285  4.252017  -.9737812  -1.850457
        "MEX" 2010 3 27.680885  60.94653   3.485229 4.1144123  -.5010219  -.9737812
        "MKD" 2009 3   22.9317  87.17699  -.4412051  5.105462  -6.483544 -12.470758
        "MKD" 2010 3 22.964737  97.88107   3.276602  4.719605  -2.107913  -6.483544
        "MLT" 2009 1 22.856483 296.97488 -3.1948564 .26835653  -6.538419  -.8654626
        "MLT" 2010 1 22.891296  307.4218   3.035346 .28414986  -4.804269  -6.538419
        "MUS" 2009 2  22.98338 104.42973  3.0411005  5.039511   -7.17475  -9.767047
        "MUS" 2010 2  23.02622 113.45708   4.129199  2.761873 -10.054045   -7.17475
        "NAM" 2009 1  23.08785 125.47756 -1.1432047  4.563721 -1.4762572  -.1118428
        "NAM" 2010 1  23.14649  108.4135  4.2762957  3.079522  -3.461317 -1.4762572
        "NGA" 2009 2 26.558756  61.80285   4.126187 8.4259205   8.182299    14.0102
        "NGA" 2010 2  26.63423  42.65139   4.999833 4.7170215   3.552578   8.182299
        "NIC" 2009 2  22.85015  86.99361 -4.5227156 4.0435667  -8.501336 -17.046085
        "NIC" 2010 2 22.893305 100.36406   3.115519 3.9219115  -8.908936  -8.501336
        "NLD" 2009 1  27.43843 118.98047 -4.2612214  .6370274   5.830153  4.1609254
        "NLD" 2010 1  27.45236 135.54501   .8838761  .6746554   7.391339   5.830153
        "NOR" 2009 3  26.77762 67.131226  -2.855408 4.5188375  11.690187  15.784298
        "NOR" 2010 3  26.78362  68.40958  -.6435047  4.216766   11.72811  11.690187
        "NPL" 2009 1  23.44898  47.07945   3.496219  6.565362   .1665825   5.845486
        "NPL" 2010 1  23.49602  45.98491   3.722471  6.010521   -.797466   .1665825
        "OMN" 2009 1 24.747795  85.28215  1.5510355  5.809792 -1.0355015   8.240634
        "OMN" 2010 1  24.79471 106.86321  -.6589162  5.466475   8.329008 -1.0355015
        "PAK" 2009 2  25.88577  32.07185   .7356375  4.146899  -2.374882  -9.204316
        "PAK" 2010 2  25.90171 32.868927  -.4846558  4.718503  -.7632174  -2.374882
        "PAN" 2009 1  24.03163 134.09517 -.19317342 1.7765557  -.7979394 -10.769425
        end
        
        
        egen id = group(code)
        xtset id year
        save d1, replace
        
        cap program drop _all
        scalar drop _all
        macro drop _all
        
        /* Convert CODE to PROGRAM  */
        program define twostep, rclass
            tempfile mlogit
            // first stage
            statsby _b, by(year) saving(`mlogit', replace): mlogit err lgdp openness pcgdpg resimp, b(1)
            merge m:1 year using `mlogit'
            sort code year
        
            gen del2 = _eq2_b_lgdp*lgdp + _eq2_b_openness*openness + _eq2_b_pcgdpg*pcgdpg + _eq2_b_resimp*resimp + _eq2_b_cons
            gen del3 = _eq3_b_lgdp*lgdp + _eq3_b_openness*openness + _eq3_b_pcgdpg*pcgdpg + _eq3_b_resimp*resimp + _eq3_b_cons
        
            gen F1 = 1/(1+exp(del2)+exp(del3))
            gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
            gen F3 = exp(del3)/(1+exp(del2)+exp(del3))
        
            gen J1 = -invnormal(F1)
            gen J2 = -invnormal(F2)
            gen J3 = -invnormal(F3)
        
            gen imr1 = -normalden(J1)/F1
            gen imr2 = -normalden(J2)/F2
            gen imr3 = -normalden(J3)/F3
            // second stage
            tempfile t1
            save `t1'
            forvalues i = 1/3{
                use `t1', clear
                regress ca ca0 imr1 if err==`i'
                local xca0 =  _b[ca0]
                return scalar ca0_`i' = `xca0'
                local ximr1 =  _b[imr1]
                return scalar imr1_`i' = `ximr1'
                /* Now must drop variables created in the program
                   or generate will fail silently in replicates
                   because variables already exist */
                }
             drop _merge del2 del3 F1 F2 F3 J1 J2 J3 imr1 imr2 imr3
        end
        
        use d1, clear
        /* Check Program twostep */
        twostep
        return list  
        
        /* Bootstrap program twostep */
        bootstrap ca0_1 = r(ca0_1) imr1_1 = r(imr1_1)  ///
                  ca0_2 = r(ca0_2) imr1_2 = r(imr1_2)  ///
                  ca0_3 = r(ca0_3) imr1_3 = r(imr1_3), ///
                  nodrop reps(10): twostep
        
        estat bootstrap, all
        log close
        Last edited by Steve Samuels; 24 Aug 2017, 19:56.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Dear Steve, Wonderful! I can not thank enough for your time, effort, and expertise. I have tried and found it works well.

          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment


          • #6
            You're very welcome, River. I see that I missed imr2 and imr3 in your regression analyses for err=2 and err=3. Easily fixed as I'm sure you saw.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Originally posted by Steve Samuels View Post
              You're very welcome, River. I see that I missed imr2 and imr3 in your regression analyses for err=2 and err=3. Easily fixed as I'm sure you saw.
              Are you saying
              Code:
              // second stage
                  tempfile t1
                  save `t1'
                  forvalues i = 1/3 {
                      use `t1', clear
                      regress ca ca0 imr`i' if err==`i'
                      local xca0 =  _b[ca0]
                      return scalar ca0_`i' = `xca0'
                      local ximr =  _b[imr`i']
                      return scalar imr_`i' = `ximr'
                      local xcons =  _b[_cons]
                      return scalar cons_`i' = `xcons'
                      /* Now must drop variables created in the program
                         or generate will fail silently in replicates
                         because variables already exist */
                      }
                   drop _merge del2 del3 F1 F2 F3 J1 J2 J3 imr1 imr2 imr3
              Ho-Chuan (River) Huang
              Stata 19.0, MP(4)

              Comment


              • #8
                My regress statement used imr1 and your change is correct.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Originally posted by Steve Samuels View Post
                  My regress statement used imr1 and your change is correct.
                  Thanks again, Steve.
                  Ho-Chuan (River) Huang
                  Stata 19.0, MP(4)

                  Comment


                  • #10
                    River asked in another thread if there was a way to speed up this bootstrapping exercise using rangerun. I've have never used bootstrap and most of this is over my head but here is what I managed to figure out. First, this is a more concise version of the code proposed by Steve in #4 with my guess to address the issue in #6 and #7:

                    Code:
                    clear all
                    
                    program define twostep, rclass
                        tempfile mlogit
                        statsby _b, by(year) saving(`mlogit', replace): mlogit err lgdp openness pcgdpg resimp, b(1)
                        merge m:1 year using `mlogit'
                        sort code year
                    
                        gen del2 = _eq2_b_lgdp*lgdp + _eq2_b_openness*openness + _eq2_b_pcgdpg*pcgdpg + _eq2_b_resimp*resimp + _eq2_b_cons
                        gen del3 = _eq3_b_lgdp*lgdp + _eq3_b_openness*openness + _eq3_b_pcgdpg*pcgdpg + _eq3_b_resimp*resimp + _eq3_b_cons
                    
                        gen F1 = 1/(1+exp(del2)+exp(del3))
                        gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
                        gen F3 = exp(del3)/(1+exp(del2)+exp(del3))
                    
                        gen J1 = -invnormal(F1)
                        gen J2 = -invnormal(F2)
                        gen J3 = -invnormal(F3)
                    
                        gen imr1 = -normalden(J1)/F1
                        gen imr2 = -normalden(J2)/F2
                        gen imr3 = -normalden(J3)/F3
                    
                        forvalues i = 1/3 {
                            regress ca ca0 imr`i' if err==`i'
                            return scalar ca0_`i' = _b[ca0]
                            return scalar imr_`i' = _b[imr`i']
                        }
                         drop _merge del2 del3 F1 F2 F3 J1 J2 J3 imr1 imr2 imr3
                    end
                    
                    set seed 213
                    use d1, clear
                    bootstrap ca0_1 = r(ca0_1) imr_1 = r(imr_1)  ///
                              ca0_2 = r(ca0_2) imr_2 = r(imr_2)  ///
                              ca0_3 = r(ca0_3) imr_3 = r(imr_3), ///
                              nodrop reps(10): twostep
                    and the results if run using the dataset defined in #4:
                    Code:
                    ------------------------------------------------------------------------------
                                 |   Observed   Bootstrap                         Normal-based
                                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                           ca0_1 |    .635022   .1003064     6.33   0.000      .438425    .8316191
                           imr_1 |   1.140804   2.076781     0.55   0.583    -2.929612     5.21122
                           ca0_2 |   .4603459   .0740344     6.22   0.000     .3152412    .6054506
                           imr_2 |    -.82331   3.327433    -0.25   0.805    -7.344959    5.698339
                           ca0_3 |   .6470341   .2107912     3.07   0.002     .2338908    1.060177
                           imr_3 |  -.5733437   1.284544    -0.45   0.655    -3.091003    1.944316
                    ------------------------------------------------------------------------------
                    When you call rangerun, a virtual copy of the data in memory stored. rangerun then loops over each observation and replaces the data in memory with observations that fall within the interval bounds defined for the current interval. The user's program is then called and results are collected from the new variables that have been created, using the values from the last observation in memory. If the interval is invalid, the program is not called and no results are stored for that observation. For each repetition, you will want to replace the data in memory with the full set of observations so I created a variable called all that is set to 1 for all observations. For the first 11 observations, I generate a valid interval; the upper bound is set to -1 for the remaining observations. This means that rangerun will call the both_stages program 11 times. For the first observation, the program will run on the actual data. For the subsequent observations, bsample is called to replace the data with a bootstrap sample.

                    I think that you can replicate most of the functionality using rangerun as follows:
                    Code:
                    clear all
                    
                    * define a program to perform the first stage by year
                    program first_stage
                        mlogit err lgdp openness pcgdpg resimp, b(1)
                        foreach eqno in 2 3 {
                            foreach v of varlist lgdp openness pcgdpg resimp {
                                gen double _eq`eqno'_b_`v' = [`eqno']_b[`v']
                            }
                            gen double _eq`eqno'_b_cons = [`eqno']_b[_cons]
                        }
                        keep _eq*
                    end
                    
                    * define a program that handles both stages
                    program both_stages
                        
                        if rr_obs > 1 bsample
                        
                        // first stage, use only last obs per year
                        bysort year (id): gen high = cond(_n == _N, year, -1)
                        rangerun first_stage, interval(year year high)
                        
                        by year: gen del2 = _eq2_b_lgdp[_N]*lgdp + _eq2_b_openness[_N]*openness + _eq2_b_pcgdpg[_N]*pcgdpg + _eq2_b_resimp[_N]*resimp + _eq2_b_cons[_N]
                        by year: gen del3 = _eq3_b_lgdp[_N]*lgdp + _eq3_b_openness[_N]*openness + _eq3_b_pcgdpg[_N]*pcgdpg + _eq3_b_resimp[_N]*resimp + _eq3_b_cons[_N]
                    
                        gen F1 = 1/(1+exp(del2)+exp(del3))
                        gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
                        gen F3 = exp(del3)/(1+exp(del2)+exp(del3))
                    
                        gen J1 = -invnormal(F1)
                        gen J2 = -invnormal(F2)
                        gen J3 = -invnormal(F3)
                    
                        gen imr1 = -normalden(J1)/F1
                        gen imr2 = -normalden(J2)/F2
                        gen imr3 = -normalden(J3)/F3
                        
                        // second stage
                        forvalues i = 1/3 {
                            regress ca ca0 imr`i' if err==`i'
                            gen ca0_`i' = _b[ca0]
                            gen imr_`i' = _b[imr`i'] 
                        }
                        keep ca0_* imr_*
                    end
                    
                    * set the seed and load the data
                    set seed 213
                    use d1
                    gen long obs = _n
                    
                    * include all observations 
                    gen byte all = 1
                    gen high2use = cond(_n <= 11, 1, -1)
                    
                    rangerun both_stages, interval(all all high2use) sprefix(rr_)
                    
                    list ca0_* imr_*  in 1
                    sum ca0_* imr_* if _n > 1
                    and the results:
                    Code:
                    . list ca0_* imr_*  in 1
                    
                         +-----------------------------------------------------------------+
                         |   ca0_1      ca0_2      ca0_3      imr_1      imr_2       imr_3 |
                         |-----------------------------------------------------------------|
                      1. | .635022   .4603459   .6470341   1.140804   -.823312   -.5733436 |
                         +-----------------------------------------------------------------+
                    
                    . sum ca0_* imr_* if _n > 1
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                           ca0_1 |         10    .7032151    .1027347   .5520792   .8427294
                           ca0_2 |         10     .461249    .0879653   .3385713   .6135879
                           ca0_3 |         10    .7074627    .2095903   .4477511   1.038585
                           imr_1 |         10   -1.410428     2.54176  -6.540113   2.274867
                           imr_2 |         10     2.29251    5.010887  -4.772506   10.55478
                    -------------+---------------------------------------------------------
                           imr_3 |         10   -.7564784    1.547175  -3.083577   1.576656
                    So my code matches the "Observed Coef." generated by bootstrap. I do not know how bootstrap generates the rest of what is reported but when I sum the results generated, the standard errors are close but definitively not the same.

                    That's as far as I can push this. rangerun should be much faster than bootstrap because of how efficiently is stores results, particularly if there are lots of repetitions. Note that using the technique as coded, you cannot do more repetitions than there are observations in the data. This can easily be fixed by adding missing observations at the end.

                    Comment


                    • #11
                      Dear Robert, Thank you so much. I am going to do some experiments to see the performance of the new procedure.

                      Ho-Chuan (River) Huang
                      Stata 19.0, MP(4)

                      Comment


                      • #12
                        Let me reiterate that I'm not sure how bootstrap does its thing so it's not because my code runs that the results necessarily make sense. Perhaps a less radical solution is to just replace the statsby call with the equivalent functionality using rangerun. The results are the same on the original sample (the Observed Coef. are the same) but the rest is different. My guess is that this is because doing the first stage differently changes the sequence of random numbers generated which means that repeated bsample calls generate different samples.

                        Code:
                        clear all
                        
                        * define a program to perform the first stage by year
                        program first_stage
                            mlogit err lgdp openness pcgdpg resimp, b(1)
                            foreach eqno in 2 3 {
                                foreach v of varlist lgdp openness pcgdpg resimp {
                                    gen double _eq`eqno'_b_`v' = [`eqno']_b[`v']
                                }
                                gen double _eq`eqno'_b_cons = [`eqno']_b[_cons]
                            }
                            keep _eq*
                        end
                        
                        * define a program that handles both stages
                        program both_stages, rclass
                        
                            // first stage, use only last obs per year
                            bysort year: gen high = cond(_n == _N, year, -1)
                            rangerun first_stage, interval(year year high)
                        
                            by year: gen del2 = _eq2_b_lgdp[_N]*lgdp + _eq2_b_openness[_N]*openness + _eq2_b_pcgdpg[_N]*pcgdpg + _eq2_b_resimp[_N]*resimp + _eq2_b_cons[_N]
                            by year: gen del3 = _eq3_b_lgdp[_N]*lgdp + _eq3_b_openness[_N]*openness + _eq3_b_pcgdpg[_N]*pcgdpg + _eq3_b_resimp[_N]*resimp + _eq3_b_cons[_N]
                        
                            drop high _eq*
                            sort code year
                        
                            gen F1 = 1/(1+exp(del2)+exp(del3))
                            gen F2 = exp(del2)/(1+exp(del2)+exp(del3))
                            gen F3 = exp(del3)/(1+exp(del2)+exp(del3))
                        
                            gen J1 = -invnormal(F1)
                            gen J2 = -invnormal(F2)
                            gen J3 = -invnormal(F3)
                        
                            gen imr1 = -normalden(J1)/F1
                            gen imr2 = -normalden(J2)/F2
                            gen imr3 = -normalden(J3)/F3
                            
                            // second stage
                            forvalues i = 1/3 {
                                regress ca ca0 imr`i' if err==`i'
                                return scalar ca0_`i' = _b[ca0]
                                return scalar imr_`i' = _b[imr`i'] 
                            }
                            
                            drop del2 del3 F1 F2 F3 J1 J2 J3 imr1 imr2 imr3
                        end
                        
                        * set the seed and load the data
                        set seed 213
                        use d1, clear
                        bootstrap ca0_1 = r(ca0_1) imr_1 = r(imr_1)  ///
                                  ca0_2 = r(ca0_2) imr_2 = r(imr_2)  ///
                                  ca0_3 = r(ca0_3) imr_3 = r(imr_3), ///
                                  nodrop reps(10): both_stages

                        Comment


                        • #13
                          Dear Robert, Thanks again. I will give it a try to see what happens.

                          Ho-Chuan (River) Huang
                          Stata 19.0, MP(4)

                          Comment


                          • #14
                            Dear Robert, May I ask another question? Following above procedures (given three models/regimes, i=1,2,3), I want to do some tests such as
                            Code:
                            // second stage
                                forvalues i = 1/3 {
                                    regress ca ca0 imr`i' if err==`i'
                                    est store m`i'
                                    return scalar ca0_`i' = _b[ca0]
                                    return scalar imr_`i' = _b[imr`i'] 
                                    return scalar cons_`i' = _b[_cons]
                                }
                                
                                suest m1 m2 m3
                                test [m1_mean]imr1 = [m2_mean]imr2 = [m3_mean]imr3 = 0
                                return scalar chi2_imr = r(chi2)
                                test [m1_mean]ca0 = [m2_mean]ca0 = [m3_mean]ca0 
                                return scalar chi2_123 = r(chi2)
                                test [m1_mean]ca0 = [m2_mean]ca0 
                                return scalar chi2_12 = r(chi2)
                                test [m1_mean]ca0 = [m3_mean]ca0
                                return scalar chi2_13 = r(chi2) 
                                test [m2_mean]ca0 = [m3_mean]ca0
                                return scalar chi2_23 = r(chi2)
                            Can I just put them right below the second stage?

                            Ho-Chuan (River) Huang
                            Stata 19.0, MP(4)

                            Comment


                            • #15
                              I don't know, I just do data management.

                              Comment

                              Working...
                              X