Bootstrapping using multiple samples

Christian Rhind

Join Date: Oct 2016
Posts: 6

Bootstrapping using multiple samples

08 Dec 2022, 19:54

I have two samples from different sources. The primary sample is a panel, but does not contain the independent variable that I am interested in. To solve this I am using an auxiliary sample that does contain the independent variable of interest that I then regress on two variables common to both samples. I then use those marginal effects to impute values for the missing variable in my primary sample. The problem is that this introduces generated regressor bias. To get around this my plan is to bootstrap the standard errors of my estimates, but for the life of me I cannot get this to work. Example code:

Code:

clear all
sysuse auto

/**************** generate datasets *************/

gen id = [_n]
save temp.dta, replace

// panel that does not contain weight variable
use temp.dta, replace
keep if id <= 10
keep id price mpg headroom

gen t = 1
save sample1_temp.dta, replace

use temp.dta, replace
keep if id <= 10
keep id price mpg headroom

gen t = 2
replace price = price + runiform(-100,100)
replace mpg = mpg + runiform(-5,5)
replace headroom = headroom + runiform(-1,1)

append using sample1_temp.dta

sort id t
save sample1.dta, replace

// sample that does contain weight variable
use temp.dta, replace
keep if id > 10
keep weight mpg headroom
save sample2.dta, replace

/******************** bootstrap ******************/

use sample2.dta, replace
capture program drop example
program define example

    quietly regres weight mpg headroom

    use sample1.dta, replace
    capture drop weight_hat
    predict weight_hat

    xtset id t
    xtreg price weight_hat
    
    exit
end
bootstrap, reps(50): example

When I run this I get the error message:

variable __000000 not found

I have tried simplifying the program:

Code:

use sample1.dta, replace
capture program drop example2
program define example2

    xtset id t
    xtreg price mpg
    
    exit
end
bootstrap, reps(50): example2

I get a different error:

insufficient observations to compute bootstrap standard errors
no results will be saved

However if replace xtreg with reg it works fine:

Code:

use sample1.dta, replace
capture program drop example3
program define example3

    reg price mpg
    
    exit
end
bootstrap, reps(50): example3

Apologies for long post but I am very confused. Any help would be greatly appreciated.

Tags: None

Christian Rhind

Join Date: Oct 2016
Posts: 6

09 Dec 2022, 01:27

I have found the answer to the the second part of my question. Since the sample in the second stage is a panel I need to account for clustering in the bootstrap procedure:

Code:

use sample1.dta, replace
capture program drop example2
program define example2

    xtset newid t
    xtreg price mpg
    
    exit
end
xtset, clear
bootstrap, reps(10) seed(1) cluster(id) idcluster(newid): stage2

So now combining this with the first stage:

Code:

use sample2.dta, replace
capture program drop example4
program define example4, eclass

    use sample1.dta, replace
    capture drop weight_hat
    predict weight_hat
    
    xtset newid t
    xtreg price weight_hat, fe
    
    exit
end
xtset, clear
bootstrap, reps(10) seed(1) cluster(id) idcluster(newid): example2

And I get the error:

variable id not found
(error in option cluster())

Presumably because id is not a variable that exists in the first stage sample.

Any ideas on how to rectify this?

Announcement

Bootstrapping using multiple samples

Comment