bootstrap 3 equations together or separately?

Marry Lee

Join Date: Nov 2020

Posts: 189
#1

bootstrap 3 equations together or separately?

12 May 2025, 12:02

Dear all,
I’m estimating a three-step model involving:
Selection Equation: A probit model for working

Outcome Equations: Two outcome equations for income and child human capital.

Structural working Equation: The working decision is modeled as a function of predicted values from the outcome equations and their squared terms.

Code:

program define full3step_allcoefs, rclass * Step 1: probit W controls of the income and child equations IV, vce(cluster coun) * Generate IMR predict xb, xb gen imr = normalden(xb) / normal(xb) if W== 1 replace imr = -normalden(xb) / (1 - normal(xb)) if W== 0 * Step 2: Outcome Equations reg hh_inc controls if W== 1 predict hh_inc1 reg hh_inc controls if W== 0 predict hh_inc0 gen change_hhInc = hh_inc1 - hh_inc0 gen change_hhIncSQ = change_hhInc * change_hhInc * Child Human Capital Equations reg child_hc controls if W== 1 predict child_hc_1 reg child_hc controls if W== 0 predict child_hc_0 gen change_child_hc = child_hc_1 - child_hc_0 gen change_child_hcSQ = change_child_hc * change_child_hc * Step 3: Structural Equation probit W change_hhInc change_hhIncSQ change_child_hc change_child_hcSQ controls, vce(cluster coun) matrix b_struct = e(b) end * Run the bootstrap bootstrap, reps(1000) seed(12345) cluster(coun): full3step_allcoefs

Problem:

When I bootstrap all equations together (selection and outcome equations), the standard errors for the structural equation (step 3) are very large, and all p-values are 1. This issue does not occur when I bootstrap each equation separately, or when I run the equations without bootstrapping.
Why bootstrapping all equations together could lead to these issues, but bootstrapping each equation separately works fine?

Would it be appropriate to bootstrap the selection and outcome equations together, but bootstrapping the structural migration equation separately (which uses predicted values from the outcome equations)? or would this lead to biases in the standard errors?

Any suggestions or advice on how to handle this would be greatly appreciated!

Best regards,
Tags: None
Josh Zweig

Join Date: Nov 2024

Posts: 18
#2

12 May 2025, 12:13

First, to answer your question directly: Why do I get huge standard errors and p-values = 1 when bootstrapping all three steps together?

This happens because in your full bootstrap, all equations are re-estimated within each replicate, meaning the predicted values used in the structural equation are re-generated with sampling variation. Since the structural equation uses differences in predicted values (e.g., change_hhInc, change_child_hc), even minor noise in each regression inflates the variance nonlinearly through these transformations — especially with squared terms. This compounding noise causes the final bootstrapped estimates to vary wildly, leading to large SEs and often meaningless p-values.

In contrast, bootstrapping each step separately avoids this cumulative error propagation — but at the cost of not capturing joint estimation uncertainty, which can understate variance if your steps are interdependent.

Should you bootstrap some parts and not others?
In general, no. If the structural equation uses predicted values from the outcome regressions, then bootstrapping it without accounting for the estimation uncertainty of those predicted values would understate standard errors — biasing inference (usually downward). This violates the logic of resampling: all randomness should be captured in the replicate process.

What are your options?

Here are three options, each with tradeoffs: 1. Full Bootstrap with Smoothed Predictions

If you want valid SEs from the full 3-step model:
bootstrap, reps(1000) seed(12345) cluster(coun): full3step_allcoefs
But smooth out the noise by:
Avoiding predict and instead using stored coefficients to compute predicted values manually (more stable across reps)

Replacing squared differences with a linear approximation

Possibly increasing reps (1000 → 2000+)

This is computationally costly but most valid. 2. Analytical SEs with Two-Step Correction (Murphy-Topel)

You can avoid bootstrapping by applying Murphy-Topel variance correction, which adjusts the SEs of the structural equation for first-stage estimation error. This is technically elegant but harder to code and not natively supported in Stata. 3. Double Bootstrap or Nested Bootstrap

A more advanced but robust solution is a nested bootstrap, where:
Outer bootstrap resamples units

Inner bootstrap is used to compute predicted values from outcome equations

This preserves the full uncertainty path but is very slow. Recommendations

If your focus is valid inference for the structural equation, then:
Full bootstrap all steps together, despite the noisiness

Check for outliers or degenerate predictions in each replicate (maybe some change_* values explode)

Use estat bootstrap, all after to inspect distribution of coefficients

If the noise is too high, consider:
capture drop *_pred capture drop change_* gen change_hhInc = predict_hhinc1 - predict_hhinc0 gen change_hhIncSQ = change_hhInc^2 * Replace predict() with _b[]*X to stabilize
2 likes
Comment
Marry Lee

Join Date: Nov 2020

Posts: 189
#3

13 May 2025, 10:46

Josh Zweig thank you so much for you detailed answer.
I have a follow up question:

The sample size for workers (W=1) is 400 and that of non workers is 1900.

Because clusters are resampled with replacement, some bootstrap draws may include very few workers.The structural equation depends on the average difference in predicted outcomes between workers and non-workers. I am wondering if this is contributing to this problem of high p values.

My questions:
Is there a way to tell bootstrap to draw a minimum number of each group?
or is there a way to know how many obsevrations of each group are used in each bootstrap draw to check if there is a problem with this or not?

Best,
Comment
Josh Zweig

Join Date: Nov 2024

Posts: 18
#4

13 May 2025, 10:58

Hi Maria,

I see how that’s a really important point.

You’re absolutely right: when clusters are resampled with replacement, it’s entirely possible that some bootstrap replicates end up with very few (or even zero) observations in the W = 1 group. Since the structural equation relies on the difference in predicted outcomes across W = 1 and W = 0, this can definitely destabilize estimates — especially when calculating squared differences, which amplify noise. That could well be driving the inflated SEs and p-values you're seeing.

1. Can we force the bootstrap to draw a minimum number of each group?
Unfortunately, no — Stata's built-in bootstrap command doesn’t offer a constraint for minimum group counts within each replicate, especially when clustering. This is a known limitation.

2. Can we monitor the number of W = 1 and W = 0 cases in each replicate?
Yes — we can modify your bootstrap program to record those counts during each replicate. Here's a way to do that:

Code:

program define full3step_allcoefs, rclass // Count W==1 and W==0 count if W == 1 return scalar count_W1 = r(N) count if W == 0 return scalar count_W0 = r(N) // [rest of your code...] end // Then store these as bootstrap statistics: bootstrap count_W1=r(count_W1) count_W0=r(count_W0), reps(1000) seed(12345) cluster(coun): full3step_allcoefs

This will give you a distribution of how many W = 1 and W = 0 cases each replicate used. You can plot or tabulate these after the bootstrap to identify any replicates with very low worker counts, which may explain instability.

If you do find many such replicates, one workaround is to post-process your bootstrap results to exclude the most extreme ones — but of course that raises questions of inference validity.

Best,
Josh - Estima
1 like
Comment

Announcement

bootstrap 3 equations together or separately?

Comment

Comment

Comment