Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping nonlinear combinations of coefficient estimates after NL command

    Hello,

    I am trying to figure out how to apply the bootstrap to nonlinear combinations of coefficient estimates after NL command. Here is an example of the type of NL command I am using:

    nl (JOBSAT={b0}+{b1}*ALTHC+{b2}*ALTWC+{b3}*(ALTWC-{c0}-{c1}*ALTHC)*(ALTWC<{c0}+{c1}*ALTHC)), initial(b0 3.844 b1 0.588 b2 -0.222 b3 0.643 c0 0 c1 1) iter(1000)

    After obtaining the estimates, I want to test nonlinear combinations such as this:

    nlcom [b0]_cons+[b2]_cons*[c0]_cons

    However, because these are nonlinear combinations of coefficients, the confidence intervals reported by STATA (which are based on normal theory) are incorrect from a technical standpoint (i.e., if the coefficients are normally distributed, products of the coefficients such as the ones I am computing will not be normally distributed). Therefore, I want to apply the bootstrap and obtain percentile-based confidence intervals (bias-corrected and perhaps bias-corrected and accelerated). Unfortunately, none of my attempts have been successful. For instance, I have tried prepending the nlcom commands with bootstrap:, and here is the result:

    . bootstrap: nlcom [b0]_cons+[b2]_cons*[c0]_cons
    (running nlcom on estimation sample)
    last estimates not found
    an error occurred when bootstrap executed nlcom
    r(301);

    I have also tried the vce(bootstrap) option. No dice.

    . nlcom [b0]_cons+[b2]_cons*[c0]_cons, vce(bootstrap)
    option vce() not allowed
    r(198);

    Among other things, I have tried to adapt the procedure shown here:

    http://www.ats.ucla.edu/stat/stata/faq/modmed.htm

    This procedure seems close to what I want, because it computes nonlinear combinations of coefficients and applies the bootstrap to them. But again, I have not been successful. I would appreciate any help you can provide.

    Thanks,

    Jeff Edwards

  • #2
    So, first there is a conceptual problem in your approach. Once you run your original -nl- estimation, the quantitites b0, b2, c0, etc. are now constants. Even if you got the syntax for -bootstrap-ping them right, you would be bootstrapping a constant. I don't know if you would even get a result, but if you did, it wouldn't be useful for anything.

    So you have to include re-estimating the -nl- model within the scope of the bootstrap.

    The next problem you have is that when -bootstrap- is invoked without being told exactly what statistics to grab and bootstrap, it looks to _b and _se. But -nlcom- doesn't return anything there. It returns its results in a matrix called r(b). So you have to tell -bootstrap- to grab that.

    So, all in all, you have to write a short program that runs your -nl- estimation, follows it with -nlcom- and then returns the result in r(). Then you -bootstrap- that program. I don't have data that look like yours, and so here's just a toy example you can model:

    Code:
    capture program drop my_program
    program define my_program, rclass
        nl (price = {b0} + {b1}*mpg + {b2}*log(headroom)*weight^2)
        nlcom _b[b0:_cons] + _b[b1:_cons]*_b[b2:_cons]
        tempname b
        matrix `b' = r(b)
        return scalar my_combination = `b'[1,1]
        exit
    end
        
    
    sysuse auto, clear
    bootstrap r(my_combination), reps(25) seed(1234): my_program
    estat bootstrap

    Comment


    • #3
      Dear Clyde,

      Now that I understand that -nlcom- treats the coefficient estimates as constants, it makes perfect sense that the bootstrap wouldn't work at that stage of the analysis.

      I have used the bootstrap with -nl- to save the _b estimates for the bootstrap samples, import them into Excel, compute nonlinear expressions of the full sample estimates, and apply the same expressions to the bootstrap estimates to derive bias-corrected percentile-based confidence intervals (Excel has some pretty neat functions that facilitate this process). I have written articles in which I direct readers to this procedure, but I have found that people often want to obtain all of the relevant results from the statistical package they use, as opposed to taking the trouble to import results into Excel (even though they can download the workbook from my website). This is why I'm trying to get STATA to produce results such as those in my post, which your program handled like a charm!

      Here is the code I ran. Note that I inserted my original -nl- expression and used my target data, which is in a file called raceext.dta.

      capture program drop my_program
      program define my_program, rclass
      nl (JOBSAT={b0}+{b1}*ALTHC+{b2}*ALTWC+{b3}*(ALTWC-{c0}-{c1}*ALTHC)*(ALTWC<{c0}+{c1}*ALTHC)), initial(b0 3.844 b1 0.588 b2 -0.222 b3 0.643 c0 0 c1 1) iter(1000)
      nlcom _b[b0:_cons] + _b[b1:_cons]*_b[b2:_cons]
      tempname b
      matrix `b' = r(b)
      return scalar my_combination = `b'[1,1]
      exit
      end

      use raceext, clear
      bootstrap r(my_combination), reps(25) seed(1234): my_program
      estat bootstrap

      I am relatively new to STATA, and I have not yet written program on my own, but I have located the "Programming Stata" chapter and will use it as a reference as I study your program to better understand how it works.

      Incidentally, in the article that presents these analyses, I need to compute more than one nonlinear combination of coefficients. I tinkered with your program and computed two nonlinear combinations named int and slope. There might be a more elegant solution, but the program worked!

      You will be duly acknowledged when the article goes to press.

      Thanks much,

      Jeff

      capture program drop my_program
      program define my_program, rclass
      nl (JOBSAT={b0}+{b1}*ALTHC+{b2}*ALTWC+{b3}*(ALTWC-{c0}-{c1}*ALTHC)*(ALTWC<{c0}+{c1}*ALTHC)), initial(b0 3.844 b1 0.588 b2 -0.222 b3 0.643 c0 0 c1 1) iter(1000)
      nlcom _b[b0:_cons] + _b[b1:_cons]*_b[c0:_cons]
      tempname int
      matrix `int' = r(b)
      return scalar int = `int'[1,1]
      nlcom _b[b1:_cons] + _b[b2:_cons]*_b[c1:_cons]
      tempname slope
      matrix `slope' = r(b)
      return scalar slope = `slope'[1,1]
      exit
      end

      use raceext, clear
      bootstrap r(int) r(slope), reps(25) seed(1234): my_program
      estat bootstrap


      Comment


      • #4
        Hi Clyde,

        I took another look at the program on this site:

        http://www.ats.ucla.edu/stat/stata/faq/modmed.htm

        Using their program syntax as a model, I was able to simplify the program you developed for me. Here it is, and I'm pleased to say that it produces the same results.

        Thanks again,

        Jeff

        capture program drop my_program
        program define my_program, rclass
        nl (JOBSAT={b0}+{b1}*ALTHC+{b2}*ALTWC+{b3}*(ALTWC-{c0}-{c1}*ALTHC)*(ALTWC<{c0}+{c1}*ALTHC)), initial(b0 3.844 b1 0.588 b2 -0.222 b3 0.643 c0 0 c1 1) iter(1000)
        return scalar int = _b[b0:_cons] + _b[b1:_cons]*_b[c0:_cons]
        return scalar slope = _b[b1:_cons] + _b[b2:_cons]*_b[c1:_cons]
        exit
        end

        use raceext, clear
        bootstrap r(int) r(slope), reps(25) seed(1234): my_program
        estat bootstrap

        Comment


        • #5
          Ah yes, you are quite right, there is no need to run -nlcom- to calculate those values. -nlcom- would only be needed if you wanted to use its standard error calculations, whereas your whole point is that you don't want those! Glad you were able to use, and then improve, the code I gave you.

          I would certainly be one of the people you refer to when you say:
          ... I have found that people often want to obtain all of the relevant results from the statistical package they use, as opposed to taking the trouble to import results into Excel
          It's not about taking the trouble to put the results into Excel. Stata's -export excel- and -putexcel- commands are quick, accurate, and painless to use. The problem is that I would never trust an analysis done in Excel. Certainly in its earlier versions, many of its statistical calculations were simply wrong. While I'm told they've cleaned that up quite a bit, there is another, equally important problem. Excel leaves no coherent audit trail of what it does, nor of the way in which its cells get populated with data. To me it is axiomatic that all data manipulations require accurate, complete, and comprehensible documentation--you just don't get that with a spreadsheet. A spreadsheet is, in effect, a black box as to its data source and calculations.

          Comment


          • #6
            Well put. I should note that the audience for my work consists of applied researchers in the social and behavioral sciences, who have limited tolerance for things such as manipulating bootstrap results to compute nonlinear expressions and derive percentile-based confidence intervals. The doctoral students I teach are relatively free of this aversion, but researchers I advise individually and through workshops tend to want tools that basically give them the answers they seek. I don't like this mentality, and I consistently try to convince these researchers that they should master the techniques that produce their results, but sometimes I am still confronted with blank stares. Your Stata code will help in this regard, and I'm attacking the same problem in SPSS (syntax for both programs will be in the appendix of the article). Like you, I have heard about (and witnessed) errors in Excel, but fortunately, the functions I am using are pretty simple, and I have verified that they are accurate, at least in my particular application. Thanks again for your help!

            Comment


            • #7
              Hi everyone, I realize this thread is quite old, but I hope someone can help with a related (statistics) question.

              I also calculated indirect effects using the nlcom command and was considering whether these effects should be tested using e.g. bootstraps. However, I am unsure why exactly this is necessary. Above, Jeff writes:
              the confidence intervals reported by STATA (which are based on normal theory) are incorrect from a technical standpoint (i.e., if the coefficients are normally distributed, products of the coefficients such as the ones I am computing will not be normally distributed)
              From what I understand, bootstraps are used in cases when there is reason to suspect that certain assumptions no longer hold - which is what I guess Jeff is refferring to. However, if the problem is mainly with normality and the calculated standard errors, is this not overcome by simply using vce robust to estimate the model preceeding the use of nlcom? And how big of a problem will the violation of assumptions be in large samples?

              Thank you in advance,
              Anne

              Comment


              • #8
                If possible I would like to revisit this post for a particular question: I created an .ado program that post e(V) and e(b) using bootstrap. I tried to use nlcom for post-estimation purposes but it only generates point values but not S.E.

                Based on the example above, it seems that nlcom cannot use the delta method when the output of a program was done using bootstrap, but I do not undestand why. The program in question is quaidsce (available via ssc). Any advice would be extremely helpful.

                Best,

                Comment


                • #9
                  Can you show an example of what you are doing the basic output you obtain and the problem?
                  ideally a reproducible example

                  Comment

                  Working...
                  X