Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping failures when including regional fixed effects

    Dear Stata users
    I am having issues with bootstrapping standard errors after including regional fixed effected in my regressions. I know this has been commented on some previous posts and it seems to be a recurrent issue but I have not yet been able to solve my problem with the info posted there.

    If I run my command without regional fixed effects as

    Code:
    bootstrap, reps(1000) seed(8976): probit y x z, cluster(hh)
    Stata has no problem and all 1000 thousand replications are successful. However, if I run instead:

    Code:
    bootstrap, reps(1000) seed(8976): probit y x z i.region, cluster(hh)
    many replications fail. Using the noisily option delivers the usual message: collinearity in replicate sample is not the same as the full sample, posting missing values
    insufficient observations to compute bootstrap standard errors no results will be saved


    My bet is that when a new sample is drawn, observations for a particular region can be missing and hence cause the problem (given that some regions have a small number of observations, this is quite possible). My sample size is quite big and the number of regions is low, so that I am confident this is not a problem of insufficient degrees of freedom.

    I have read about using the nodrop command, but this does not work. If what is happening is the problem I just mentioned (some regions not appearing in some replications), then I was thinking about specifying more than 1000 replications and keeping the first thousand successful ones. This may look a bit of a dirty practice, but to the extent that it is only due to some regions being minor in terms of observations in the dataset, it could be a solution. Does anyone know how to keep the first 1000 succesful replications our of (say) 3000?

    Many thanks in advance for your help,

    Basile

  • #2
    Basile:
    welcome to the list.
    Just out of curiosity: can't you switch to clustered standard errors if -bootstrap- is bothering you?
    That said, if you want to go along the way that you have described, you can use -runiform()- to act as a filter to delete the -bootstrap- replications in excess.
    The following toy-example considers discarding 50 out of 100 faked bootstrap replications:
    Code:
    . set obs 100
    number of observations (_N) was 0, now 100
    
    . g faked_boot_repl=23*runiform()
    
    . g counter=runiform()
    
    . sort counter
    
    . drop if _n>50
    (50 observations deleted)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo
      Many thanks for your answer. I am sorry I forgot to add that the reason I am bootstraping in the first place is because one of the regressor is a generated regressor from a previous estimation. Sorry about this. So the code would read:

      Code:
      bootstrap, reps(1000) seed(8976): probit y x zhat i.region, cluster(hh)
      Where zhat is the generated regressor

      Thanks a lot

      Comment


      • #4
        Your diagnosis of the problem may well be correct. I think there is a better solution. Specify the -cluster(region)- and -idcluster(pick_a_new_variable_name)- in the -bootstrap- prefix and the re-sampling will respect the region structure of your data.

        Also, while Stata still accepts the older syntax -cluster(hh)- to specify cluster-robust covariance estimation in the -probit- command, the current and preferred syntax for this is -vce(cluster hh)-. Although it doesn't affect how anything runs, in the context where you need to specify a -cluster()- option in the bootstrap prefix, using -vce(cluster hh)- will be less confusing to read.

        All of that said, it looks like you are trying to get around the non-existence of an -xtprobit, fe-. Without trying to counsel you on the wisdom of that, I should point out that if -xtprobit, fe- did exist, you would not be able to specify region as the panel variable and then obtain robust VCE clustered on hh: in fixed-effect estimators the robust VCE clusters have to have panels nested in them, not the other way around. My theoretical knowledge in these areas is limited, so I can't give you a full and robust (no pun intended) explanation of my concerns, but I sense you are skating on thin statistical ice here, even if you can get your code to run and produce output.

        Comment


        • #5
          Dear Clyde. Thanks for your answer. I have tried this option but it does not work either. In fact, using the -cluster(region)- and -idcluster(new_var)- produces 100 per cent failure rate, so even worse than before... For example:

          Code:
          bootstrap id, reps(1000) seed(9876) cluster(region) idcluster(id): probit y x zhat i.region, vce(cluster hh)
          produces full failure...

          Regarding the 'big picture' problem I do agree this can be seen as thin ice, and I will give it more thought!

          Comment


          • #6
            Hi again,
            I have found that using the jackknife option instead of the bootstrap option works perfectly fine, even after inclusion of regional fixed effects. It seems that jackknife is a 'special case' of bootstrap, so I'm wondering whether it is ok to use it? In economics research one often comes cross bootstrap but not so much jackknife.

            I am thinking that if bootstrapping fails because some regions are not including after resampling, then jackknife could be a solution since once the observation has been 'used', it would be dropped in the next re sampling procedure (which has N-1 observations), so that it could not cause any failure in the future (as in bootstrap). Does this reasoning make sense or am I on the wrong track?

            Many thanks again!

            Comment


            • #7
              Code:
              bootstrap id, reps(1000) seed(9876) cluster(region) idcluster(id): probit y x zhat i.region, vce(cluster hh)
              is legal syntax, but it is probably not what you intend, and it may be why you get failure on every replication here. The bolded italicized id does not belong there. (The one in -idcluster(id)- is fine, assuming id does not already exist as a variable in your data.) It is telling Stata to save "the value of id" and bootstrap it. GIven that id does not even exist until after the resampling is done (because it is a new variable created by the -idcluster()- option, and given that it is a variable, not a scalar, I'm not sure what Stata makes of it. At the very least, I'm sure that Stata is unable to find anything called id in the returned results of -probit- and therefore is, at best, posting a missing value each time.

              If I were you, I would try re-running this with that id removed.

              Comment


              • #8
                Because you are clustering your bootstraps the the regional level and include regional fixed effects, in some bootstrap runs, not all the regions are sampled. This means you cannot get an estimate for the fixed effect parameter and causes an issue with the bootstrap. I had a similar issue. One way I managed to get around a similar problem (assuming you are not interested in the fixed effect coefficients) is by estimating the probit model quietly (e.g.

                Code:
                 
                capture program drop spec
                program spec, eclass        
                         qui: probit y x z, vce(cluster hh)
                end
                
                bootstrap_b, reps(1000) seed(9876) cluster(region): spec
                Then extract the coefficients of interest using
                Code:
                esttab
                or
                Code:
                estout
                .

                Note i also ran an additional
                Code:
                margins
                command after quietly running the model, which may be the reason why quietly works in my case.

                Comment

                Working...
                X