Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel bootstrap after Wooldridge (1995) two-step estimator

    Dear Statalisters,

    I have a panel data set which suffers from sample selection bias. I am following the approach of Wooldridge (1995) and Semykina and Wooldridge (2010). The approach applied in order to correct for sample selection bias is to:
    1. Estimate T different probits of the selection equation in ordet to retrieve T inverse mills ratios

    select=x1 z1

    I do this using a loop:
    HTML Code:
    forvalues i = 2005(1)2016  {
    disp `i'
    probit select `x1' `z1' if yy==`i', vce(robust)
    predict xb`i' if yy==`i', xb
    qui replace IMR=normalden(xb`i')/normal(xb`i') if `y2'==1 & yy==`i'
    }
    2. For all observations in the sample (i.e., for those where select=1), use pooled OLS to estimate the main equation:

    reg food2 `x1' IMR i.yy if select==1, vce(cluster mc)

    3. Estimate the asymptotic variance

    Now, I am struggling with step 3. In their paper from 2010, Semykina and Wooldridge write on p. 378 "Instead of using analytical formulae for the asymptotic variance, one can apply "panel bootstrap". This involves resampling cross-sectional units (and all time periods for each unit sampled) and using the bootstrap sample to approximate the distribution of the parameter vector "

    It is my impression that with a two-step estimator, one should bootstrap over both (in this case) the probit(s) and the second stage main regression. However, I have not found any clues on Statalist or elsewhere on how to bootstrap standard errors, accounting for two separate regressions, when the first step involves estimating T different probits. I assume it would be something similar in style to the "program" approach in https://www.statalist.org/forums/for...ction-on-stata , but taking into account the T different probits in the first stage.
    I.e.,

    HTML Code:
    * Bootstrap SE
    program heck2, rclass
    
    forvalues i = 2005(1)2016  {
    disp `i'
    probit select `x1' `z1' if yy==`i', vce(robust)
    predict xb`i' if yy==`i', xb
    qui replace IMR=normalden(xb`i')/normal(xb`i') if select==1 & yy==`i'
    }
    
    xtset mc yy
    reg food2 `x1' IMR i.yy, vce(cluster mc)
    
    return scalar beta = _b[L1_goal]
    
    end
    
    bootstrap r(beta), reps(100) seed(1234) nodrop:heck2
    estat bootstrap
    Which renders the following error message:

    HTML Code:
    . bootstrap r(beta), reps(100) seed(1234) nodrop:heck2
    (running heck2 on estimation sample)
    varlist required
    an error occurred when bootstrap executed heck2

    Is there anyone that has used the "panel bootstrap" method that Semykina and Wooldridge (2010) refers to, and in this context? If so, I would be grateful to recieve any suggestions on how this could be implemented using Stata code.

    I am aware that my question does not pertain necessarily to a particular Stata command, so I completely understand if this is the wrong forum for it.

    /Hanna

    References:
    Wooldridge, Jeffrey M. "Selection corrections for panel data models under conditional mean independence assumptions." Journal of econometrics 68.1 (1995): 115-132.
    Semykina, Anastasia, and Jeffrey M. Wooldridge. "Estimating panel data models in the presence of endogeneity and selection." Journal of Econometrics 157.2 (2010): 375-380.
    Last edited by Hanna Lindstrom; 16 Sep 2019, 08:47.

  • #2
    Hanna: I might be able to help. But Anastasia has Stata code on her website at Florida State.

    Comment


    • #3
      Prof Wooldridge, thank you very much for your reply.

      Yes I have seen the code of Anastasia Semykina on her web page. It was very helpful, but mainly involved the solution using matrices if i remember it correctly.

      However, I think I was able to solve the issues I had with implementing "panel bootstrapped" errors. I used the -program- command in Stata in order to bootstrap over probit equations and pooled OLS, and when running the -boostrap- command calling the program, I told Stata to perform bootstrap on cross-sectional units, and on all time periods for those units sampled. (see code below).
      At least, this is the understanding I had of the solution based on https://www.stata.com/support/faqs/s...th-panel-data/

      HTML Code:
      tsset id yy
      
      program myprogram, rclass 
      
      * STEP 1) Probit equations
      gen IMR=.
      
      forvalues i = 2005(1)2016  {
      disp `i'
      probit select `x1' `z1' if yy==`i', vce(robust)
      predict xb`i' if yy==`i', xb 
      qui replace IMR=normalden(xb`i')/normal(xb`i') if report==1 & yy==`i'
      }
      
      * STEP 2) Pooled OLS with year dummies
      
      reg food2 `x1' IMR i.yy
      
      return scalar beta = _b[var1] 
      
      drop xb* IMR
      exit
      end
      
      generate newid=id
      tsset newid yy
      
      bootstrap r(beta), reps(100) seed(1234) cluster(id) idcluster(newid) nodrop:myprogram 
      estat bootstrap
      If this is completely wrong, I would be very thankful for comments and suggestions.

      Kind regards,
      Hanna

      Sources: https://www.stata.com/support/faqs/s...th-panel-data/

      Comment


      • #4
        Hanna: That looks roughly how my code looks when I've used the panel bootstrap. I haven't used it for this particular problem, but for similar problems with panel data. In particular, it should handle both the two-step estimation and allow for serial correlation and heteroskedasticity of unknown form.

        Comment


        • #5
          Prof Wooldridge,
          Thank you very much indeed for your reply!

          Kind regards,
          Hanna

          Comment

          Working...
          X