Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping using multiple samples

    I have two samples from different sources. The primary sample is a panel, but does not contain the independent variable that I am interested in. To solve this I am using an auxiliary sample that does contain the independent variable of interest that I then regress on two variables common to both samples. I then use those marginal effects to impute values for the missing variable in my primary sample. The problem is that this introduces generated regressor bias. To get around this my plan is to bootstrap the standard errors of my estimates, but for the life of me I cannot get this to work. Example code:

    Code:
    clear all
    sysuse auto
    
    /**************** generate datasets *************/
    
    gen id = [_n]
    save temp.dta, replace
    
    // panel that does not contain weight variable
    use temp.dta, replace
    keep if id <= 10
    keep id price mpg headroom
    
    gen t = 1
    save sample1_temp.dta, replace
    
    use temp.dta, replace
    keep if id <= 10
    keep id price mpg headroom
    
    gen t = 2
    replace price = price + runiform(-100,100)
    replace mpg = mpg + runiform(-5,5)
    replace headroom = headroom + runiform(-1,1)
    
    append using sample1_temp.dta
    
    sort id t
    save sample1.dta, replace
    
    // sample that does contain weight variable
    use temp.dta, replace
    keep if id > 10
    keep weight mpg headroom
    save sample2.dta, replace
    
    /******************** bootstrap ******************/
    
    use sample2.dta, replace
    capture program drop example
    program define example
    
        quietly regres weight mpg headroom
    
        use sample1.dta, replace
        capture drop weight_hat
        predict weight_hat
    
        xtset id t
        xtreg price weight_hat
        
        exit
    end
    bootstrap, reps(50): example
    When I run this I get the error message:

    variable __000000 not found
    I have tried simplifying the program:

    Code:
    use sample1.dta, replace
    capture program drop example2
    program define example2
    
        xtset id t
        xtreg price mpg
        
        exit
    end
    bootstrap, reps(50): example2
    I get a different error:

    insufficient observations to compute bootstrap standard errors
    no results will be saved
    However if replace xtreg with reg it works fine:

    Code:
    use sample1.dta, replace
    capture program drop example3
    program define example3
    
        reg price mpg
        
        exit
    end
    bootstrap, reps(50): example3
    Apologies for long post but I am very confused. Any help would be greatly appreciated.



  • #2
    I have found the answer to the the second part of my question. Since the sample in the second stage is a panel I need to account for clustering in the bootstrap procedure:

    Code:
    use sample1.dta, replace
    capture program drop example2
    program define example2
    
        xtset newid t
        xtreg price mpg
        
        exit
    end
    xtset, clear
    bootstrap, reps(10) seed(1) cluster(id) idcluster(newid): stage2
    So now combining this with the first stage:

    Code:
    use sample2.dta, replace
    capture program drop example4
    program define example4, eclass
    
        use sample1.dta, replace
        capture drop weight_hat
        predict weight_hat
        
        xtset newid t
        xtreg price weight_hat, fe
        
        exit
    end
    xtset, clear
    bootstrap, reps(10) seed(1) cluster(id) idcluster(newid): example2
    And I get the error:

    variable id not found
    (error in option cluster())
    Presumably because id is not a variable that exists in the first stage sample.

    Any ideas on how to rectify this?

    Comment

    Working...
    X