Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression analysis- bootstrap red x

    Hi everyone, I'm having this problem with my classmates and can't find a solution! When I use bootstrap to test the significance of the coefficients of my regression model, I get little red x because I have dummy variables (UWrepu , Adrepu , year, industry)in my control variables? (I find that when I remove the time and industry fixed effects it's not a red “x” anymore)
    Please help me to answer! Thank you!

    code:bootstrap, reps(1000): reg Underprice SSEr size roa lev Issuerate IPOduration UWrepu Adrepu rd IH F M i.year i.ind,noheader

  • #2
    Without example data it is not possible to answer this question with certainty. In the future (including any response on this thread if my suggestion below does not resolve the problem) always show example data when asking for help troubleshooting code. And always use the -dataex- command to do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Meanwhile, add the -noisily- option to your -bootstrap:- prefix, and change the reps(1000) option to a small number so you don't get buried in output. That way, Stata will show you the regressions themselves and also the error messages explaining why the regressions do not run. From there, hopefully, you will be able to fix your problem, whatever it may be.

    If you are not able to solve your problem after seeing that, when posting back, in addition to example data, show the failing regression output and error messages.

    Comment


    • #3
      Hi, I use bootstrapping because I estimate a regressor and then use it in a different regression. Now, I found out that the more controls I use in the second-stage regression, the more red crosses I get. Now, I added the -noisily- option, as suggested by Clyde. The error message which leads to the red crosses is apparently: "collinearity in replicate sample is not the same as the full sample, posting missing values" Furthermore, I can see that some of my controls are omitted for collinearity.
      So, I understand that some of my controls are collinear in subsamples of my full sample.

      So, do you think it is a warranted way forward, if I just ignore the red crosses, if there are few, or if I increase the number or reps such that I have approximately the same number of black dots as I wanted to have (i.e.. if I want 1000 replications, but half of the reps have red crosses I just increase the number of reps to 2000)?

      Comment


      • #4
        When working with indicator ("dummy") variables in -bootstrap- it can happen that some samples will end up producing a colinearity between them that is not there in the original data. Typically this only happens if one or more of these indicators is indicating a rare event (or its opposite, where the indicator is almost always 1). Working with such variables can be treacherous even in simple analyses, and they usually don't contribute a lot to the ultimate model. So you might consider removing the rare variable(s) from the model. If your variables UWrepu and Adrepu represent events that are not rare (or almost always), then perhaps there is an error in your data: you should look for that and fix it if found.

        As for your way forward, if there is only a small number of red crosses, I would ignore them. But if you have so many that you are tempted to double the number of reps to compensate for this problem, then I think you should rather remove the offending variable(s) from the model. After all, if the distributions of those variables is so problematic that half of the bootstrap samples are unanalyzable, then it more or less represents a coincidence that your original sample doesn't have the same problem. When the analysis is that brittle, I wouldn't rely on it.

        Comment

        Working...
        X