Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reproduction from Papke and Wooldrige (2008): issues with bootstrap (insufficient observations)

    I'm trying to estimate a fractionnal regression model with unobserved heterogenity.
    I choose the method developped by papke and Wooldrige (2009) "Panel data methods for fractional response variables with an application to test pass rates".

    Before applying the code to my data, I tried to replicate their results using the code provided on papke's website: http://econ.msu.edu/faculty/papke/Pa...statafiles.zip

    The code I'm interresting in is the panel case with exogoneity of covariates ("math4_boot_exog_gee.do").
    Here is a copy of the code from the website:

    Code:
    capture program drop math4_boot
    
    program math4_boot, rclass
    
    glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01 if year>1994, fa(bin) link(probit) cluster(distid)
    mat b = e(b)
    xtgee math4 lavgrexp lunch lenroll alavgrexp alunch alenroll y96-y01, fa(bi) link(probit) corr(exch) robust from(b,skip)
    
    return scalar b1 = _b[lavgrexp]
    return scalar b2 = _b[lunch]
    return scalar b3 = _b[lenroll]
    
    predict x1b1hat, xb
    gen scale=normden(x1b1hat)
    gen pe1=scale*_b[lavgrexp]
    summarize pe1
    return scalar ape1=r(mean)
    gen pe2=scale*_b[lunch]
    summarize pe2
    return scalar ape2=r(mean)
    gen pe3=scale*_b[lenroll]
    summarize pe3
    return scalar ape3=r(mean)
    
    drop x1b1hat scale pe1 pe2 pe3
    end
    
    
    *Bootstrapped SE within districts
    bootstrap r(b1) r(b2) r(b3) r(ape1) r(ape2) r(ape3), reps(20) seed(123) cluster(distid) idcluster(newid): math4_boot
    
    program drop math4_boot
    I had a few issues with it, first, the normden function has been depreciated so I replaced it with normalden.

    Then I got this error message:
    Code:
    Bootstrap replications (20)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    repeated time values within panel
    the most likely cause for this error is misspecifying the cluster(), idcluster(), or group() option
    r(451);
    So I added an "xtset, clear" statement before the bootstrap statement, and "xtset distid year" before the estimations of the fractional regression model.


    Here is my full code at this point:
    Code:
    capture program drop math4_boot
    
    program math4_boot, rclass
    
    xtset distid year
    
    glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01 if year>1994, fa(bin) link(probit) cluster(distid)
    mat b = e(b)
    xtgee math4 lavgrexp lunch lenroll alavgrexp alunch alenroll y96-y01, fa(bi) link(probit) corr(exch) robust from(b,skip)
    
    
    return scalar b1 = _b[lavgrexp]
    return scalar b2 = _b[lunch]
    return scalar b3 = _b[lenroll]
    
    
    predict x1b1hat, xb
    gen scale=normalden(x1b1hat)
    gen pe1=scale*_b[lavgrexp]
    summarize pe1
    return scalar ape1=r(mean)
    gen pe2=scale*_b[lunch]
    summarize pe2
    return scalar ape2=r(mean)
    gen pe3=scale*_b[lenroll]
    summarize pe3
    return scalar ape3=r(mean)
    
    drop x1b1hat scale pe1 pe2 pe3
    end
    
    
    
    *Bootstrapped SE within districts
    xtset, clear
    bootstrap r(b1) r(b2) r(b3) r(ape1) r(ape2) r(ape3), reps(20) seed(123) cluster(distid) idcluster(newid): math4_boot
    
    program drop math4_boot
    But I'm still having an error message:
    Code:
    Bootstrap replications (20)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    xxxxxxxxxxxxxxxxxxxx
    insufficient observations to compute bootstrap standard errors
    no results will be saved
    I tried adding the "nodrop" option in the boostrap statement but it did not change anything.


    Any idea what could cause this issue please?
    Last edited by Yoann Morin; 10 Jul 2019, 12:01.

  • #2
    I did not undestood everything that was going on but I managed to get the program to work by ignoring the time dimension in the xtset statement. Also the variable to identify the individual dimension is the ID created in the bootstrap statement as opposed to the original ID as I specified in the previous version.

    Here is the functional code:
    Code:
    capture program drop math4_boot
    
    program math4_boot, rclass
    
    xtset newid
    
    glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01 if year>1994, fa(bin) link(probit) cluster(distid)
    mat b = e(b)
    xtgee math4 lavgrexp lunch lenroll alavgrexp alunch alenroll y96-y01, fa(bi) link(probit) corr(exch) robust from(b,skip) 
    
    
    return scalar b1 = _b[lavgrexp]
    return scalar b2 = _b[lunch]
    return scalar b3 = _b[lenroll]
    
    
    predict x1b1hat, xb
    gen scale=normalden(x1b1hat)
    
    gen pe1=scale*_b[lavgrexp]
    summarize pe1
    return scalar ape1=r(mean)
    
    gen pe2=scale*_b[lunch]
    summarize pe2
    return scalar ape2=r(mean)
    
    gen pe3=scale*_b[lenroll]
    summarize pe3
    return scalar ape3=r(mean)
    
    drop x1b1hat scale pe1 pe2 pe3
    end
    
    
    
    *Bootstrapped SE within districts
    bootstrap r(b1) r(b2) r(b3) r(ape1) r(ape2) r(ape3), reps(20) seed(123) cluster(distid) idcluster(newid): math4_boot
    
    program drop math4_boot
    I still have a minor issue, despite the seed being set to the same value as in their paper, the boostraped standard errors seems to be slighly different.

    Here is the comparison for each variable and its Average partial Effect (APE):

    In the paper Reproduction Absolute value of the difference
    log(arexppp) 0.206 0.2059
    1E-04
    log(arexppp) APE 0.070 0.0691
    0.0009
    lunch 0.209 0.2148
    0.0058
    lunch APE 0.067 0.0729
    0.0059
    log(enroll) 0.139 0.1412
    0.0022
    log(enroll) APE 0.045 0.0477
    0.0027

    Do you know why is there such differences? Could it be explained by the different stata version used? (this was estimated on stata 13, don't know which version the author used but the paper is from 2008).

    Comment


    • #3
      Originally posted by Yoann Morin View Post
      I did not undestood everything that was going on but I managed to get the program to work by ignoring the time dimension in the xtset statement. Also the variable to identify the individual dimension is the ID created in the bootstrap statement as opposed to the original ID as I specified in the previous version.

      Here is the functional code:
      Code:
      capture program drop math4_boot
      
      program math4_boot, rclass
      
      xtset newid
      
      glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01 if year>1994, fa(bin) link(probit) cluster(distid)
      mat b = e(b)
      xtgee math4 lavgrexp lunch lenroll alavgrexp alunch alenroll y96-y01, fa(bi) link(probit) corr(exch) robust from(b,skip)
      
      
      return scalar b1 = _b[lavgrexp]
      return scalar b2 = _b[lunch]
      return scalar b3 = _b[lenroll]
      
      
      predict x1b1hat, xb
      gen scale=normalden(x1b1hat)
      
      gen pe1=scale*_b[lavgrexp]
      summarize pe1
      return scalar ape1=r(mean)
      
      gen pe2=scale*_b[lunch]
      summarize pe2
      return scalar ape2=r(mean)
      
      gen pe3=scale*_b[lenroll]
      summarize pe3
      return scalar ape3=r(mean)
      
      drop x1b1hat scale pe1 pe2 pe3
      end
      
      
      
      *Bootstrapped SE within districts
      bootstrap r(b1) r(b2) r(b3) r(ape1) r(ape2) r(ape3), reps(20) seed(123) cluster(distid) idcluster(newid): math4_boot
      
      program drop math4_boot
      I still have a minor issue, despite the seed being set to the same value as in their paper, the boostraped standard errors seems to be slighly different.

      Here is the comparison for each variable and its Average partial Effect (APE):

      In the paper Reproduction Absolute value of the difference
      log(arexppp) 0.206 0.2059
      1E-04
      log(arexppp) APE 0.070 0.0691
      0.0009
      lunch 0.209 0.2148
      0.0058
      lunch APE 0.067 0.0729
      0.0059
      log(enroll) 0.139 0.1412
      0.0022
      log(enroll) APE 0.045 0.0477
      0.0027

      Do you know why is there such differences? Could it be explained by the different stata version used? (this was estimated on stata 13, don't know which version the author used but the paper is from 2008).
      Hi Morin,

      Have you succeeded in replicating this paper? I'm trying to do so, would like to hear from you and share with you.

      Comment

      Working...
      X