Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping a user-written program

    Hello,

    Using Stata 13.1, I am running a Heckman two-step regression. I am bootstrapping the difference of two fitted values. (The original data has synthetic set to 1. I then collapse the data by sex and set synthetic equal to 3. I am finding the difference between the fitted values with an associated synthetic value of 3). When I run the bootstrap command I receive the following error message: "insufficient observations to compute bootstrap standard errors." The program works when I run just the OLS (or second stage regression) and crashes once I add the probit. Additionally, the program itself works and can be run multiple times without generating errors.

    Because I wanted to make the code more accessible, I made a version using auto.dta. Surprisingly, I am able to bootstrap the program; yet I am not able to distinguish the differences in the code that may be driving the errors in my initial program.


    Code:
    use MR_work/mr_datasets/temp_all.dta, clear
    * regressors*
    global regs1 "widowed divorced separated never_married midwest south west hsd08 hsd911 hsg cg ad pot_exp1 pot_exp2 pot_exp3 pot_exp4 hsd08_pot_exp1 hsd08_pot_exp2 hsd08_pot_exp3 hsd08_pot_exp4 hsd911_pot_exp1 hsd911_pot_exp2 hsd911_pot_exp3 hsd911_pot_exp4 hsg_pot_exp1 hsg_pot_exp2 hsg_pot_exp3 hsg_pot_exp4 cg_pot_exp1 cg_pot_exp2 cg_pot_exp3 cg_pot_exp4 ad_pot_exp1 ad_pot_exp2 ad_pot_exp3 ad_pot_exp4"
    global probit_regs " __child06 child06_widowed child06_divorced child06_separated child06_never_married"
    global regs2 "sex*"
    
    **************************
    *Define bootstrap program*
    **************************
    capture prog drop bias70s90s
    prog define bias70s90s, rclass
    set more off
    
    append using MR_work/mr_datasets/tempV2.dta //add 2 observations with synthetic==3
    probit ftfy $regs1 $probit_regs if sex==1 & timepd==0 & synthetic==1 [pweight=wgt]
    tempvar lambda_input lambda_female
    predict `lambda_input' if sex==1, xb
    gen `lambda_female' = normalden(-`lambda_input') / (1 - normal(-`lambda_input')) if timepd==0
    replace `lambda_female'=0 if sex==0
    replace `lambda_female'=0 if synthetic>1
    
    reg log_hourly_wage $regs1 $regs2 `lambda_female' if timepd==0 & wagesmpl==1 [pw=wgt], robust
    tempvar xb
    predict `xb'
    sum `xb' if sex==1 & synthetic==3
    local m1=r(mean)
    sum `xb' if sex==0 & synthetic==3
    local m2=r(mean)
    return scalar ols0_fix=`m1'-`m2'
    
    
    drop if synthetic>1
    end
    
    ** end bootstrap loop
    
    
    *Run bootstrap command*
    bootstrap r(ols0_fix), reps(10) seed(123) strata(timepd): bias70s90s

    Code:
    sysuse auto, clear
    gen synthetic=1
    global regs1 "mpg"
    global regs2 "weight"
    capture prog drop bias70s90s
    
    **************************
    *Define bootstrap program*
    **************************
    prog define bias70s90s, rclass
    set more off
    
    append using "C:\Users\Briana\Desktop\Temporary\auto_temp.dta" //created by collapse, by(foreign)
    probit foreign $regs2 if synthetic==1
    tempvar lambda_input lambda_female
    predict `lambda_input', xb
    gen `lambda_female' = normalden(-`lambda_input') / (1 - normal(-`lambda_input'))
    replace `lambda_female'=0 if synthetic==2
    
    reg displacement $regs1 `lambda_female' if synthetic==1, robust
    tempvar xb
    predict `xb'
    
    sum `xb' if foreign==1 & synthetic==2
    local m1=r(mean)
    sum `xb' if foreign==0 & synthetic==2
    local m2=r(mean)
    return scalar ols0_fix=`m1'-`m2'
    
    drop if synthetic>1
    end
    
    ** end bootstrap loop
    
    
    *Run bootstrap command*
    bootstrap r(ols0_fix), reps(10) seed(123): bias70s90s
    Thank you!
    Last edited by Briana Sullivan; 08 Mar 2016, 00:21.

  • #2
    Hi Briana, I have almost the same problem, but still not fixed. Really hope that someone can help you and see if that allows me to take advantage and fix mine too. In my code I think the problems comes from the subsample in the main regression, I have tryied without the subsample and then it works. But I need the main equation and also the lambdas for the subsample, so.... Lets see if someone help us on this. Only for if you want to take a look: http://www.statalist.org/forums/foru...eckman-process

    Comment


    • #3
      I wrote my own bootstrap, and I am able to successfully run the code. I used: http://www.ats.ucla.edu/stat/stata/faq/ownboot.htm as a template.
      Hope it helps!

      Comment


      • #4
        Thanks Briana, I have been trying with that page you say me, but it seems that I´m not too smart with Stata because I´m unable to extrapolate that into my program. Can you put an example of your code now that you say is working? That would help a lot.
        Thanks in advance.

        Comment


        • #5
          Hi Damian,

          Below is a version of my code. I've tried to add some explanations within the code itself.

          Code:
          use MR_work/mr_datasets/temp_all.dta, clear
          
          
          *STEP 1 - Run the regressions and save the results (this seems to be a necessary step)
          quietly{
          append using MR_work/mr_datasets/tempV2.dta
          
          forval i=0/1{
          probit ftfy $regs1 $probit_regs if sex==1 & timepd==`i' & synthetic==1 [pweight=wgt] //My regressors are globals
          cap drop lambda_input lambda_female
          predict lambda_input if sex==1, xb
          gen lambda_female = normalden(-lambda_input) / (1 - normal(-lambda_input)) if timepd==`i'
          replace lambda_female=0 if sex==0
          replace lambda_female=0 if synthetic>1
          
          reg log_hourly_wage $regs1_nomarst $regs2_nomarst lambda_female if timepd==`i' & wagesmpl==1 [pw=wgt], robust
          cap drop xb
          predict xb
          
          sum xb if sex==1 & synthetic==3
          scalar m1=r(mean)
          sum xb if sex==0 & synthetic==3
          scalar m2=r(mean)
          scalar nomarst`i'_heck=m1-m2 //wage difference calculated using fixed synthetics
          }
          }
          matrix diff=( nomarst0_heck, nomarst1_heck)
          matrix list diff
          drop if synthetic>1
          
          
          *STEP 2 - Create the program
          cap program drop myboot
          program define myboot, rclass
              preserve //preserve the data
              bsample, strata(timepd) //samples the data in memory with replacement
          *Repeat the regressions from above, using tempvars and return scalar in the program
          append using MR_work/mr_datasets/tempV2.dta
          
          
          forval i=0/1{
          probit ftfy $regs1 $probit_regs if sex==1 & timepd==`i' & synthetic==1 [pweight=wgt]
          tempvar lambda_input lambda_female
          predict `lambda_input' if sex==1, xb
          gen `lambda_female' = normalden(-`lambda_input') / (1 - normal(-`lambda_input')) if timepd==`i'
          replace `lambda_female'=0 if sex==0
          replace `lambda_female'=0 if synthetic>1
          
          
          reg log_hourly_wage $regs1_nomarst $regs2_nomarst `lambda_female' if timepd==`i' & wagesmpl==1 [pw=wgt], robust
          tempvar xb
          predict `xb'
          
          *I gave the scalars different names compared to step 1
          sum `xb' if sex==1 & synthetic==3
          local m1=r(mean)
          sum `xb' if sex==0 & synthetic==3
          local m2=r(mean)
          local nomarst0`i'_heck=`m1'-`m2'
          return scalar nomarst0`i'_heck=`m1'-`m2' //wage difference calculated using fixed synthetics
          }
          
          drop if synthetic>1
              restore //return data to original state
          end
          
          *STEP 3 - use the scalars that are created by the program
          simulate r(nomarst00_heck) r(nomarst01_heck), reps(1000) seed(123): myboot
          
          
          *STEP 4 - use results stored in Step 1. If you have more than one statistic that you're bootstrapping you'll need to make a matrix of scalars in step one (this is how I created diff)
          bstat, stat(diff)

          Comment


          • #6
            Hi Briana, thanks a lot for your help, I will start right now with this. So, first we need to generate the estimations outside the program and store the coefficients (in my case) in a matrix, and then do the same inside the program and return the coefficients in scalar. After that I simulate the scalars (my coefficients) in the program, and finally use the results store in the matrix created in the first step. Ok, I will try right now.
            Again, thanks a lot for your help.

            Comment


            • #7
              Dear Statalisters,
              I am using fractional logit (in the first stage regression) in Stata 14 to estimate the predicted values of (X1) and the second stage would be OLS. Anyone who can suggest me how to write my bootstrap program for standard errors.
              Thanks

              Comment

              Working...
              X