Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • insufficient observations (AGAIN) to compute bootstrap standard errors

    Dear Statlist users,

    I tried to find a solution for this problem, but haven't found anything that will work. Only saw that there are many people with similar problem and that it has to do something with missing observations

    I tried to drop missing observations (of mB variable, other variables that I used in "inflect" are not missing), and tried to use nodrop option, but still get "insufficient observations to compute bootstrap standard errors" message.

    Any suggestions?

    here is the code:

    Code:
    bootstrap inflect=(-1*(_b[mB]+_b[ttt_mB*1)/(2*(_b[mBsquared]+_b[ttt_mBsquared]*1))): sureg (Y1 x1 x2 x3 mB mBsquared ttt ttt_mB ttt_mBsquared i.year i.priority i.id) (Y2 x1 x2 x3 mB mBsquared ttt ttt_mB ttt_mBsquared i.year i.priority i.id)
    I also tried a simpler version (just in case)

    Code:
    bootstrap _b[mB] _b[ttt]: sureg (Y1 x1 x2 x3 mB mBsquared ttt ttt_mB ttt_mBsquared i.year i.priority i.id) (Y2 x1 x2 x3 mB mBsquared ttt ttt_mB ttt_mBsquared i.year i.priority i.id)
    both give the same error

    thank you in advance


  • #2
    In your case, what is missing is not values of the regression variables; it is the various _b[]'s you are trying to calculate with. When you use -sureg-, the coefficients in _b[] are not named the way you are referring to them. They have complex names reflecting the equations they come from. To see the correct names to use, just run a single instance of your-sureg- command, adding the -coeflegend- option. Stata will then give you a table showing the proper _b[] references to use.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      In your case, what is missing is not values of the regression variables; it is the various _b[]'s you are trying to calculate with. When you use -sureg-, the coefficients in _b[] are not named the way you are referring to them. They have complex names reflecting the equations they come from. To see the correct names to use, just run a single instance of your-sureg- command, adding the -coeflegend- option. Stata will then give you a table showing the proper _b[] references to use.
      Thank you Clyde, that was indeed a mistake in the code, however, when fixed it didn't change the outcome. still getting the same error message. Here is the new code:

      Code:
       
       bootstrap inflect_Y1=(-1*(_b[Y1:mB]+_b[Y1:ttt_mB*1)/(2*(_b[Y1:mBsquared]+_b[Y1:ttt_mBsquared]*1))) inflect_Y2=(-1*(_b[Y2:mB]+_b[Y2:ttt_mB*1)/(2*(_b[Y2:mBsquared]+_b[Y2:ttt_mBsquared]*1))): sureg (Y1 x1 x2 x3 mB mBsquared ttt ttt_mB ttt_mBsquared i.year i.priority i.id) (Y2 x1 x2 x3 mB mBsquared ttt ttt_mB ttt_mBsquared i.year i.priority i.id)

      Comment


      • #4
        Well, I see two errors (actually, it looks like two copies of the same error) in your new -bootstrap- command: the reference _b[Y1:ttt_mB*1 lacks a closing square bracket. The analogous reference with Y2: has the same problem. That said, I'm surprised to see it give you the same error message as before: I would expect it to more directly complain about the syntax. So maybe there's still more to it.

        If fixing that doesn't resolve the problem, I suggest you post a sample of your data (please use -dataex- for that). Perhaps someone can try running it and see where it goes wrong.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Well, I see two errors (actually, it looks like two copies of the same error) in your new -bootstrap- command: the reference _b[Y1:ttt_mB*1 lacks a closing square bracket. The analogous reference with Y2: has the same problem. That said, I'm surprised to see it give you the same error message as before: I would expect it to more directly complain about the syntax. So maybe there's still more to it.

          If fixing that doesn't resolve the problem, I suggest you post a sample of your data (please use -dataex- for that). Perhaps someone can try running it and see where it goes wrong.
          sorry, that was a copy pasting mistake. of course the syntax was correct and the command is running. I fixed the code below:
          Code:
          bootstrap inflect_Y1=(-1*(_b[Y1:mB]+_b[Y1:ttt_mB]*1)/(2*(_b[Y1:mB2]+_b[Y1:ttt_mB2]*1))) inflect_Y2=(-1*(_b[Y2:mB]+_b[Y2:ttt_mB]*1)/(2*(_b[Y2:mB2]+_b[Y2:ttt_mB2]*1))): sureg (Y1 x1 x2 x3 x4 x5 x6 mB mB2 ttt ttt_mB2 ttt_mB2 i.f1 i.f2 i.f3 i.f4 i.f5 i.f6 ) (Y2 x1 x2 x3 x4 x5 x6 mB mB2 ttt ttt_mB2 ttt_mB2 i.f1 i.f2 i.f3 i.f4 i.f5 i.f6)
          here is a sample of the data:

          1. Y1 and Y2 are dependent vars for SUR eq 1 and 2 accordingly
          2. all other variables are included in both eq 1 and 2
          3. the f_ variables are factors, used as i.f1 i.f2 ... in SUR
          4. mB2 is a square of mB
          5. ttt_mB and ttt_mB2 are the multiplication between ttt and mB or mB2

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(Y1 x1 x2 x3 x4 x5 x6) byte mB double mB2 float ttt double(ttt_mB ttt_mB2) int f1 long f2 byte(f3 f4 f5) int f6 float Y2
           -.6931472       540       13      18.5  2 1         0 0 0  1  0  0 2007 5 11 2 0  1 6.436151
           -.6931472        90        7      39.5  2 1         0 0 0  1  0  0 2008 1  2 1 0  2        .
            .6931472      1769       11     23.75  2 1 1.0986123 1 1  1  1  1 2009 2  8 3 0  3        .
            .4054651     219.5        5     11.25  4 2         0 0 0  2  0  0 2009 5  8 3 0  3        .
            3.839452     208.5       12      43.5  2 2         0 0 0 12  0  0 2008 2  6 3 1  4 6.313548
           2.6741486 242.33333       13 33.916668  1 3 1.0986123 1 1  5  5  5 2008 3  6 3 1  4 5.480639
          -1.3862944      1568       10        17 13 1         0 1 1  1  1  1 2009 1  8 2 0  5 8.271804
                   0       555        7      2.75 17 1  .6931472 2 4  1  2  4 2009 5  8 2 0  5 5.123964
             2.70805      11.4      8.4      22.7  3 5         0 0 0 15  0  0 2008 1 11 3 1  5 7.255591
             4.26268      10.2      5.2      20.5  3 5  .6931472 1 1 23 23 23 2008 1 11 3 1  5 8.133881
            3.433987 24.333334 5.666667  8.333333  4 3 2.3025851 1 1 11 11 11 2008 1 11 2 1  5 8.090709
           1.5040774      20.5        7       7.5  5 2  2.397895 1 1  3  3  3 2008 1 11 2 1  5 6.011267
            2.484907       256       16         7  2 1         0 0 0  3  0  0 2009 1  6 2 1  6 5.123964
            2.484907       269       16        21  2 1  .6931472 1 1  7  7  7 2009 3  6 2 1  6        .
           -.6931472      1250       11      5.75  5 1         0 0 0  1  0  0 2008 1  6 3 0  7 5.257495
             3.78419         5        6         4 13 1  .6931472 1 1  6  6  6 2009 5  6 3 1  7 6.473891
           1.3862944         6        6       3.5  8 1 1.0986123 1 1  1  1  1 2009 1  6 3 1  7 5.123964
           2.0794415         7        6         3 13 1 1.3862944 1 1  2  2  2 2009 1  6 3 1  7 4.969813
            2.772589         8        6         4 13 1  1.609438 1 1  2  2  2 2009 1  6 3 1  7 5.575949
            3.583519         9        6        13  8 1 1.7917595 1 1  7  7  7 2009 3  6 3 1  7 5.257495
            .6931472       205        9        10  3 1         0 0 0  1  0  0 2009 5  6 2 0  8 7.508787
           -.2876821     925.5       10    28.625 10 2         0 0 0  2  0  0 2009 5  6 2 0  9 4.564348
           -.2876821     926.5       10    28.875  8 2 1.0986123 1 1  2  2  2 2009 5  6 2 0  9 4.787492
           -.2876821     927.5       10    28.375 10 2  1.609438 1 1  2  2  2 2009 5  6 2 0  9 4.564348
           -.2876821     928.5       10    29.375 11 2   1.94591 1 1  2  2  2 2009 5  6 1 0  9 4.787492
            .4054651     123.5        8        11 18 2         0 1 1  2  2  2 2008 3 11 2 0 10 4.276666
           2.0794415         2        6         2 15 1 1.0986123 2 4  1  2  4 2008 3 11 3 1 10        .
            2.890372         0        2       3.5 16 1         0 0 0  4  0  0 2008 1 15 2 1 10  6.39693
           2.0794415         1        6       1.5  2 1         0 2 4  1  2  4 2009 1  1 1 1 10        .
           1.3862944         1        5      3.75 16 1  .6931472 2 4  1  2  4 2008 3 15 2 1 10 5.886104
           2.6390574        18        6         7 18 1  2.944439 3 9  2  6 18 2009 3 15 2 1 10 7.905442
           1.0986123         7        5         0 15 1 2.0794415 2 4  1  2  4 2008 1 15 2 1 10  6.39693
           2.3025851         9        5         1 14 1 2.1972246 2 4  3  6 12 2008 3 15 2 1 10 6.612041
           2.6390574         5        5         1 16 1 1.7917595 2 4  2  4  8 2008 3 15 2 1 10  6.39693
           1.3862944         3        5       .25 17 1 1.3862944 2 4  1  2  4 2008 1 15 1 1 10 6.122493
           1.3862944         2        5      3.25 17 1 1.0986123 2 4  1  2  4 2008 3 15 2 1 10 4.564348
           1.3862944         4        5       .25 18 1  1.609438 2 4  1  2  4 2008 1 15 2 1 10 3.871201
           1.3862944         6        5         1 16 1   1.94591 2 4  2  4  8 2008 1 15 2 1 10 4.276666
           1.0986123         8        5         1 17 1 2.3025851 2 4  1  2  4 2008 3 15 2 1 10 6.579251
           2.0794415        10        6       1.5 12 1  2.397895 2 4  1  2  4 2008 1 15 2 1 10 5.257495
            2.772589        11        6         4 13 1  2.484907 2 4  2  4  8 2009 1 15 2 1 10 5.123964
           2.0794415        12        6       4.5 13 1  2.564949 2 4  1  2  4 2009 1 15 1 1 10 4.787492
            2.772589        14        6         3 14 1   2.70805 3 9  2  6 18 2009 3 15 2 1 10 6.313548
           1.0986123        13        6         5 13 1 2.6390574 3 9  1  3  9 2009 1 15 2 1 10 3.871201
            2.995732        16        6         3 18 1  2.833213 3 9  3  9 27 2009 3 15 1 1 10 6.612041
            .6931472        15        6         0 13 1  2.772589 3 9  1  3  9 2009 1 15 1 1 10 4.787492
            2.772589        19        6         5 17 1  2.995732 3 9  2  6 18 2009 3 15 2 1 10 5.886104
           3.0910425        17        6         4 19 1  2.890372 3 9  4 12 36 2009 3 15 2 1 10 5.662961
            2.484907        20        6         3 18 1 3.0445225 3 9  2  6 18 2009 3 15 2 1 10 6.122493
            2.995732        21        6         5 19 1 3.0910425 3 9  4 12 36 2009 3 15 2 1 10 6.068426
           3.6888795        22        6        10 16 1  3.135494 3 9  7 21 63 2009 3 15 2 1 10 7.288928
           2.0794415        23        6       5.5 13 1  3.178054 3 9  1  3  9 2009 1 15 1 1 10 3.178054
           2.0794415        24        6      12.5 12 1  3.218876 3 9  1  3  9 2009 3 15 2 1 10 3.178054
            2.397895        25        6        13 14 1 3.2580965 3 9  3  9 27 2009 3 15 2 1 10  6.64379
           1.3862944        26        6        10 16 1  3.295837 3 9  1  3  9 2009 1 15 2 1 10 3.178054
           1.3862944        27        6       9.5 15 1 3.3322046 3 9  1  3  9 2009 1 15 2 1 10 3.178054
           1.0986123        28        6         7 15 1  3.367296 3 9  1  3  9 2009 1 15 1 1 10 3.178054
           1.0986123        29        6       4.5 13 1 3.4011974 3 9  1  3  9 2009 1 15 2 1 10 3.178054
           2.0794415        30        6        15 15 1  3.433987 3 9  2  6 18 2009 1 15 2 1 10  7.66669
            2.833213        31        6        24 15 1  3.465736 3 9  4 12 36 2009 3 15 2 1 10 6.222576
            2.397895        32        6        25 16 1  3.496508 3 9  3  9 27 2009 1 15 2 1 10 3.178054
           2.3025851        33        6        15 18 1 3.5263605 3 9  3  9 27 2009 3 15 2 1 10 6.436151
             2.70805        34        6        11 19 1  3.555348 3 9  3  9 27 2009 3 15 2 1 10 7.221105
           1.3862944        35        6      14.5 18 1  3.583519 3 9  1  3  9 2009 1 15 2 1 10 3.178054
           3.2580965        36        6        15 19 1  3.610918 3 9  3  9 27 2009 3 15 2 1 10 7.069874
           2.6390574        37        6         6 17 1  3.637586 3 9  3  9 27 2009 1 15 2 1 10 5.950643
            2.564949        38        6         8 16 1 3.6635616 3 9  4 12 36 2009 1 15 2 1 10 7.238497
           1.7917595        39        6         6 15 1 3.6888795 3 9  1  3  9 2009 3 15 2 1 10 3.178054
            2.772589        41        6         3  7 1   3.73767 3 9  2  6 18 2009 3 15 1 1 10 6.939254
             2.70805        40        6         6  8 1  3.713572 3 9  6 18 54 2009 3 15 1 1 10 6.612041
            1.252763     641.5        8      6.75  4 2         0 0 0  2  0  0 2008 1  6 2 1 11 5.257495
           1.0986123         3        5        .5  3 1 1.0986123 1 1  1  1  1 2008 3  6 2 1 11        .
            .4054651         4        5         0  2 1 1.3862944 1 1  1  1  1 2008 3  6 2 1 11 3.178054
            .6931472         0        5       .25  2 1         0 1 1  1  1  1 2008 3 11 3 1 11        .
           3.0910425         1        6         1  2 1  .6931472 2 4  3  6 12 2008 3 11 3 1 11 7.129298
           2.1972246         0        2         9  5 1  1.609438 2 4  3  6 12 2009 5  6 1 1 11 4.276666
            .6931472         1        3      9.75  6 1 1.7917595 2 4  1  2  4 2009 5  6 1 1 11 4.276666
            2.484907         0        0         1 11 1  .6931472 1 1  2  2  2 2008 1 17 1 1 12 4.564348
           1.3862944         1        1       .25 10 1 1.0986123 1 1  1  1  1 2008 1 17 2 1 12 5.375278
           1.3862944         2        1         0  9 1 1.3862944 1 1  1  1  1 2008 1 17 3 1 12 5.575949
           2.0794415         0        1         3  8 1         0 0 0  3  0  0 2008 1 17 2 1 12 5.575949
           1.0986123         3        1         0 11 1  1.609438 1 1  1  1  1 2008 1 17 1 1 12 5.662961
           4.3694477         8        2         4 17 1 2.3025851 1 1 16 16 16 2008 3 17 1 1 12 6.962244
            2.564949         7        1         3 19 1 2.0794415 1 1  2  2  2 2008 1 17 2 1 12 6.356108
           1.7917595         6        1         2 18 1 2.1972246 1 1  2  2  2 2008 1 17 1 1 12 6.068426
           2.0794415         5        1       .25 20 1   1.94591 1 1  1  1  1 2008 1 17 1 1 12  6.54535
           1.0986123         4        1         0 16 1 1.7917595 1 1  1  1  1 2008 1 17 1 1 12 6.962244
           1.0986123        22        2         7 14 1 3.3322046 1 1  1  1  1 2009 3 17 3 1 12 6.915723
             1.94591        14        2         4 15 1  2.772589 1 1  3  3  3 2008 1 17 1 1 12 6.674562
            .6931472        12        2         3 19 1   2.70805 1 1  1  1  1 2008 3 17 3 1 12 5.817111
           1.3862944         1        5         0 19 1  2.397895 1 1  1  1  1 2008 1 17 1 1 12 3.871201
                   0        13        2         2 19 1  2.833213 1 1  1  1  1 2008 3 17 2 1 12 5.662961
            1.252763         3        5         0 20 1  2.890372 1 1  1  1  1 2008 1 17 1 1 12 5.743003
           1.0986123         2        5         0 20 1 2.6390574 1 1  1  1  1 2008 1 17 2 1 12 4.276666
            .6931472        10        2         4 19 1  2.484907 1 1  2  2  2 2008 3 17 2 1 12 5.886104
           1.7917595        20        2       3.5 16 1 3.2580965 1 1  1  1  1 2009 3 17 2 1 12  6.81564
            2.833213        21        2         6 17 1  3.295837 1 1  5  5  5 2009 1 17 2 1 12 7.028202
           2.0794415        11        2         4 18 1  2.564949 1 1  2  2  2 2008 1 17 2 1 12 5.950643
            2.772589         6        6         4 20 1 3.5263605 1 1  2  2  2 2009 1 17 3 1 12 3.178054
            2.484907         4        5         1 16 1  2.944439 1 1  2  2  2 2008 3 17 1 1 12 3.871201
          end
          Last edited by Constantin Alba; 09 Mar 2017, 19:22.

          Comment


          • #6
            Well, you have new typos in your command. You can't have _b[Y1:ttt_mB*1]. The *1 is not part of the variable name. You have the same thing with Y2 later on. Also, your -sureg- command refers to a variable ttt_mB2, but the closest thing to that in your data is tttt_mB2. And while it isn't illegal, you have the variable ttt_mB2 listed twice in each of the -sureg- equations. But, again, none of these should produce the result you're getting: you should be getting messages about bad syntax or things not found.

            I won't be able to help you get to the bottom of this until you post the actual command with a sample of the actual data that it was run on, and that actually produced the error message you're concerned with. Don't try re-typing the command here; it's too long and complicated to get it right more than a fraction of the time. Use copy/paste directly from Stata's Results window to show it.

            That said, I may see the problem arising from this data. The variable f6 has only one observation with value 1 or 2, and only two each with values 3 and 4. In most of your bootstrap samples, one or more of these values will not be represented at all. When that happens, Stata will choose to omit some higher-value indicator from the model to break the colinearity among the f6 indicators and the constant term. This then makes the pattern of included and omitted variables different among the bootstrap replications. -bootstrap- checks for that and when it finds it, it skips that regression. I think you are ending up skipping all or most of your regressions because of this and you are left with either none or just a few, which isn't enough or -bootstrap- to work with.

            I don't see an obvious way to fix this problem (if that is what is actually happening--it may be that in the full large data you don't have a "rare value" problem.)

            Comment


            • #7
              You right again re-syntax. I wrote the code manually here, just trying to simplify it (original var names were long and not really had good names)
              regarding the problem you suspect with f6 - i just posted here part of the data, and not the full dataset, of course. the actual range of values for f6 is from 1 to 400.

              I did not understand what is the problem that you see with f6, maybe you can explain again? Anyway, dropping it (which I cannot do) does produce enough observation and bootstrap finishes its run successfully (although with 20 out of 50 reps not estimated). Dropping another factor var (f4) gets it to all 50 estimated...


              Comment


              • #8
                Let me try to explain it a bit more. Forget the bootstrap for a minute, and just think about the -sureg-. With all of the variables in it you will have 399 indicator ("dummy") variables for f6. Before -bootstrap- begins its sampling process, it first runs the -sureg- command on the entire data set. And it remembers what variables were in the model, and which, if any, were omitted due to colinearity.

                You don't say how large your data set is, but even if it is pretty large, if the distribution of values is at all like the one in your example data, some of those values (let's say 1, for concreteness) are going to be pretty rare. So when you do your bootstrap samples, many of the samples you draw could turn out not to have any observations with f6 = 1. When it then tries to do -sureg- on that sample, because there are no values of f6 = 1, it has to select some other value of f6 to omit to avoid colinearity among the remaining 398 indicator variables. But when -bootstrap- sees that it is now being asked to estimate a different model, one that lacks some of the indicators that were used in the full-data-set run, it says: aha! This is a problem: I can't include this in the bootstrap because this isn't the same model we started with. So it skips that regression and doesn't record any results for that round and moves on to the next. If this happens often enough, then it may be left with only a small handful of valid regression results (or maybe even with none at all) and so is unable to calculate bootstrap summary statistics.

                If you can't drop f6 (and maybe f4), I don't really know how to advise you. If you were to specify f6 in the -strata()- option to -bootstrap-, you would get around this problem and would get some results. But I don't think they would be the correct answers, because stratified bootstrap sampling does not, as far as I know, emulate sampling variation properly unless the original data was, itself, obtained from a sample stratified on that (those) variable(s). I could be wrong about this, and if somebody else knows that I am, or if somebody knows of another solution to Constatin's problem, I hope they will chime in.

                Comment


                • #9
                  Thanks, Clyde. your explanation makes sense.

                  Regarding -strata- option, f6 is the variable by which data is actually clustered. So I either run regression/sureg with it as a fixed effect or use -cluster- option in "regress"

                  Am I wrong to say that here it is very similar to stratifying the sample based on f6 ? If yes, than this resolves my problem...

                  Comment


                  • #10
                    Well, stratified sampling on f6 means that you first identified sampling frames associated with each possible value of f6, and then you created your sample by doing independent random sampling from each of those frames. This is different from simple random sampling. In a sample random sample, you could end up with a sample that has no representatives for certain values of f6 (like what is happening in your bootstrap!) In a stratified random sample that never happens because, by design, you separately sample each stratum. It's an important aspect of sampling when it occurs because it typically results in smaller standard errors, particularly if the stratum variable is itself predictive of outcome. Was your sample assembled in that way?

                    I hasten to add that clustering is not the same as stratification. In fact, in a certain sense they are opposites of each other. (And the effect of clustering is to increase standard errors, the opposite of what stratification does.) In a clustered sample, the concern is that there is intra-cluster correlation of outcomes. That is not an issue with stratification. So when you tell me that your data are clustered on f6, it is most unlikely that they are also stratified on f6.

                    Comment


                    • #11
                      no, it is not... you right hopefully someone else has a solution for this problem.

                      Comment

                      Working...
                      X