Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bootstrap to get the confidence intervals for a linear cobmination of regression coefficients

    I am estimating a variable called tot, tot=t1+t2
    t1=p1*c1 p1 is the mean of a variable, c1 is the coefficient of a regression for this variable
    t2=p2*c2 the same as t1 with a different variable
    tot=t1+t2

    I calculated the value of tot by the linear combinations of the numbers. My coauthor asked me to get the variance for the value of tot.

    I created a bootstrap program to calculate the confidence intervals of tot, but there are many error messages. I am wondering whether I am at the right direction. Here are the codes
    Code:
    capture program drop bootc
    program define bootc,rclass
    drop quant p1 p2 c1 c2 t1 t2 tot
    
    gen quant=40000
    
    mean cond1
    gen p1=r(mean)
    mean cond2
    gen p2=r(mean)
    
         
    glm cost i.cond $char, family(gamma) link(log) le(95)
    gen c1=_b[1.cond]
    gen c2=_b[2.cond]
    
    
    gen t1=quant*p1*c1
    gen t2=quant*p2*c2
    
    gen tot=t1+t2
    
    end
    
    simulate tot=r(tot), reps(4) seed(12345): bootc
    estat bootstrap, all
    Thank you very much!

  • #2
    You say there are many error messages, but you don't even show us one, and you provide no example data to work with. You have a global macro $char that you refer to that, depending what it contains, could cause all sorts of trouble--but you don't show us how it was defined.

    I'll just start by pointing out one obvious error: your program bootc doesn't actually return any values, so r(tot) is going to be undefined when bootstrap tries to use it.

    Anything else I might see would be purely guesswork. So if you need more help, please post back with more information. Be sure to use the -dataex- command to show the example data. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.


    Comment


    • #3
      Hi Clyde,

      Thank you very much for your response!

      I am using stata version 14.
      here is the example of data for observation 1/5
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(quant condition1 condition2 condition comorbidity1 comorbidity2 age_reg region1) double year
      40000 0 0 4 2 1 2 3 2013
      40000 0 0 4 2 2 4 2 2011
      40000 0 0 4 2 2 2 2 2014
      40000 0 0 4 2 2 5 2 2012
      40000 0 0 3 2 2 2 2 2013
      end
      I also changed the coding for the program and it is now running
      Code:
      capture program drop bootc
      program define bootc,rclass
      drop p1 p2 c1 c2 t1 t2 t3
      sum condition1
      gen p1=r(mean)
      sum condition2
      gen p2=r(mean)
      
      glm tot ib4.condition $chart, family(gamma) link(log) le(95)
      gen c1=_b[1.condition]
      gen c2=_b[2.condition]
      gen t1=p1*c1
      gen t2=p2*c2
      gen t3=t1+t2
      
      sum p1 p2 c1 c2 t2 t2
      sum t3
      return scalar m_t3=r(mean)
      
      end
      
      bootstrap "bootc" r(m_t3) , reps(4)  dots noesample
      but the results might not be what I want.
      Here is the output for a single run
      Code:
      . sum condition1
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
        condition1 |    237,305    .0630117    .2429846          0          1
      
      .
      . gen p1=r(mean)
      
      .
      . sum condition2
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
        condition2 |    237,305    .0429194     .202676          0          1
      
      .
      . gen p2=r(mean)
      
      .
      .
      .
      . glm tot ib4.condition $chart, family(gamma) link(log) le(95)
      
      Iteration 0:   log likelihood = -2539490.6 
      Iteration 1:   log likelihood = -2537920.2 
      Iteration 2:   log likelihood = -2537914.7 
      Iteration 3:   log likelihood = -2537914.7 
      
      Generalized linear models                         No. of obs      =    231,958
      Optimization     : ML                             Residual df     =    231,940
                                                        Scale parameter =   .8473622
      Deviance         =  72342.00066                   (1/df) Deviance =   .3118996
      Pearson          =  196537.1878                   (1/df) Pearson  =   .8473622
      
      Variance function: V(u) = u^2                     [Gamma]
      Link function    : g(u) = ln(u)                   [Log]
      
                                                        AIC             =   21.88269
      Log likelihood   = -2537914.731                   BIC             =   -2793117
      
      --------------------------------------------------------------------------------
                     |                 OIM
                 tot |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      ---------------+----------------------------------------------------------------
           condition |
                  1  |   .3846174    .007918    48.58   0.000     .3690984    .4001364
                  2  |   .2649116   .0095251    27.81   0.000     .2462427    .2835805
                  3  |   .1033199   .0106751     9.68   0.000      .082397    .1242427
                     |
      2.comorbidity1 |  -.3242158   .0121272   -26.73   0.000    -.3479847   -.3004469
      2.comorbidity2 |  -.1157325   .0073616   -15.72   0.000     -.130161   -.1013041
                     |
             age_reg |
                  2  |   .0324376   .0105722     3.07   0.002     .0117165    .0531586
                  3  |    .039536   .0099123     3.99   0.000     .0201084    .0589637
                  4  |   .1044109   .0098414    10.61   0.000      .085122    .1236997
                  5  |   .2134998   .0104579    20.42   0.000     .1930027     .233997
                  6  |   .3027203   .0139648    21.68   0.000     .2753497    .3300909
                     |
             region1 |
                  2  |  -.2195618    .006267   -35.03   0.000    -.2318448   -.2072788
                  3  |  -.2545354   .0058274   -43.68   0.000    -.2659569   -.2431139
                  4  |  -.0555107   .0068679    -8.08   0.000    -.0689716   -.0420498
                     |
                year |
               2010  |  -.0251801   .0077155    -3.26   0.001    -.0403023    -.010058
               2011  |   .0136008    .007603     1.79   0.074    -.0013009    .0285025
               2012  |   .0523197   .0076892     6.80   0.000     .0372491    .0673903
               2013  |   .0651913   .0076962     8.47   0.000      .050107    .0802755
                     |
               _cons |   10.37795   .0171021   606.82   0.000     10.34443    10.41147
      --------------------------------------------------------------------------------
      
      .
      . gen c1=_b[1.condition]
      
      .
      . gen c2=_b[2.condition]
      
      .
      . gen t1=p1*c1
      
      .
      . gen t2=p2*c2
      
      .
      . gen t3=t1+t2
      
      .
      .
      .
      . sum p1 p2 c1 c2 t2 t2
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                p1 |    237,305    .0630117           0   .0630117   .0630117
                p2 |    237,305    .0429194           0   .0429194   .0429194
                c1 |    237,305    .3846174           0   .3846174   .3846174
                c2 |    237,305    .2649116           0   .2649116   .2649116
                t2 |    237,305    .0113699           0   .0113699   .0113699
      -------------+---------------------------------------------------------
                t2 |    237,305    .0113699           0   .0113699   .0113699
      
      .
      . sum t3
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                t3 |    237,305    .0356053           0   .0356053   .0356053
      why the values I got for p1, p2, c1,c3, t1 and t2 are all having standard deviation of 0? How can these numbers include the standard errors when they were derived from sum command and regression models?

      I am trying to get the variance of t3, which equals to the percent of condition 1 in the sample multiplying by the coefficient in the regression model for condition 1, plus the percent of condition 2 in the sample multiplying by the regression coefficient for condition2. It seems that after one bootstrap, I get a single value of t3, then after all the runs of bootstrap, I can get the mean and se of t3. Is it the correct way to do it? Can I generate a distribution for p1, p2, c1, c2 based on their estimates and SEs from the descriptive analysis and regression model, then directly estimate t3 as a linear combination of the scalars?

      Many thanks!

      Comment


      • #4
        I can't help you unless you provide example data that fits your code. Your example doesn't contain the variable tot, so the -glm- command breaks execution. To be honest, I find it hard to believe that your code runs with your real data either. For example, the -drop p1 p2 c1 c2 t1 t2 t3- will break on the first iteration because none of those variables exist at that point in time. The -glm- command you show can't possibly work, because the term ib4.condition is ambiguous: there are two variables, condition1 and condition2 in your data that could match that so Stata won't know which you mean and will complain and halt.

        It is a waste of your time and mine to try to work with variants of the code and data that are incompatible with each other and which, presumably, aren't your real situation anyway. Please post back with an example data set that contains all the necessary variables. Also, provide more than 5 observations so that a -glm- can actually be run with the number of predictors you want to use. And if you have code that actually does run with the data, show that, not some modified version that does not. If you don't have code that actually runs with your data, then show the code you have and say that it doesn't run (and explain at what point it first breaks down, and in what way.)

        Comment


        • #5
          Hi Clyde,

          Sorry that I forgot to add the variable tot in the dataex command. I did not quite understand the purpose you asked me to use dataex. I thought you wanted to take a look of what the data looked like. I did use help dataex as you suggested and the example used dataext [variabes] in 1/5. So I just copied that.

          The program I wrote did run. Since I ran the single lines first to make sure that the program was working, the variables p1 p2 c1 c2 t1 t2 t3 already existed when I ran bootstrap. That is the reason in my boostrap program I put the -drop to the front of the program. The version of the stata I used seemed to be ok with variables names condition1 condition2 and condition and treated them as different variables. The reason I created variables condition1 and conditon2 from condition variable is because previous I used _mean and it seemed not to handle the reference group for a factor variable.

          Here are the dataex with 100 observations and the updated program.
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(tot condition1 condition2 condition comorbidity1 comorbidity2 age_reg region1) double year
          12230.784 0 0 4 2 1 2 3 2013
           27138.45 0 0 4 2 2 4 2 2011
                  . 0 0 4 2 2 2 2 2014
          22133.076 0 0 4 2 2 5 2 2012
           16084.31 0 0 3 2 2 2 2 2013
          14460.688 0 0 4 2 2 1 2 2012
          21720.744 0 0 4 2 2 4 2 2009
          15751.087 0 0 4 2 2 1 2 2012
          23999.094 0 0 4 2 2 2 2 2013
          12221.836 0 0 4 2 2 1 2 2009
           10990.61 0 0 4 2 2 2 2 2010
          20045.705 0 0 4 2 2 2 2 2011
           39123.56 0 0 4 2 2 1 2 2010
          17267.063 0 0 4 2 2 2 2 2013
          35509.125 0 0 4 2 2 2 3 2010
          27885.385 0 0 4 2 2 1 3 2010
          15218.536 0 0 3 2 2 1 3 2010
           18491.17 0 0 4 2 2 2 2 2010
           20955.22 0 0 4 2 2 2 3 2013
           501.6274 0 0 4 2 2 1 3 2009
          16260.028 0 0 4 2 2 4 2 2013
          19279.207 0 0 4 2 2 2 3 2010
          14529.271 0 0 4 2 2 2 3 2011
          18348.807 1 0 1 2 2 2 3 2013
           17623.72 0 0 4 2 2 2 3 2010
           10866.12 0 0 4 2 2 6 3 2011
           9120.884 0 0 4 2 2 1 3 2010
          1124.7971 0 0 4 2 2 2 3 2011
          10813.613 0 0 3 2 2 2 3 2011
          10580.297 0 0 4 2 2 2 2 2013
          19772.094 0 0 4 2 2 2 3 2013
          17197.434 0 0 4 2 2 2 3 2010
           20625.27 1 0 1 2 2 2 3 2009
           9502.901 0 0 4 2 1 5 3 2012
          30841.314 0 0 4 2 2 2 3 2011
          26660.605 0 0 4 2 2 2 2 2012
          13079.123 0 0 4 2 2 2 3 2013
           23844.43 0 0 4 2 2 6 4 2011
           20748.48 0 0 4 2 2 5 2 2009
           9687.315 0 0 4 2 2 2 3 2010
            19082.1 0 0 4 2 2 3 3 2011
           25679.48 0 0 4 2 2 2 2 2012
           9133.064 0 0 4 2 2 1 3 2013
          15799.744 0 1 2 2 2 1 3 2010
          136807.42 0 0 3 2 2 6 3 2011
          17688.082 0 0 4 2 2 2 2 2012
          18604.977 0 0 4 2 1 5 2 2010
           58454.69 0 0 4 1 1 4 2 2011
          15568.723 0 0 4 2 2 2 3 2012
            7277.22 0 0 4 2 2 1 3 2010
          11476.823 0 0 4 2 2 1 3 2010
          25596.783 0 1 2 2 2 2 3 2010
          16735.883 0 0 4 2 2 2 3 2013
           9436.415 0 0 4 2 2 1 3 2013
          16778.453 0 0 4 2 2 1 3 2013
           72456.75 0 0 4 2 2 2 3 2013
          15899.514 0 0 4 2 2 2 3 2012
          18538.082 0 0 4 2 2 2 3 2011
                  . 0 0 4 2 1 5 3 2014
          13459.928 0 0 4 2 2 1 3 2011
           8709.145 0 0 4 2 2 1 3 2011
          17641.574 0 0 4 2 2 2 2 2013
          18484.951 0 1 2 2 2 2 3 2012
           7906.807 0 0 4 2 2 2 3 2010
          18142.637 0 0 4 2 2 2 3 2012
          14751.168 0 0 4 2 2 2 3 2011
          12929.168 0 0 3 2 2 2 3 2011
          16360.733 0 0 4 2 2 2 3 2012
          17765.434 0 0 4 2 2 2 3 2013
          11128.518 0 0 4 2 2 1 3 2011
           8507.135 0 0 4 2 2 2 2 2012
           85945.82 0 0 4 2 2 2 2 2013
          18480.865 0 0 4 2 2 2 3 2010
          14985.722 0 0 4 2 2 1 3 2009
          19835.693 0 0 4 2 2 2 3 2012
          26443.623 0 0 4 2 1 2 2 2011
            12182.3 0 0 4 2 2 1 3 2011
          36349.105 0 0 4 2 2 1 3 2011
          17491.602 1 0 1 2 2 2 3 2012
           34568.88 0 0 4 2 2 2 3 2010
          12187.305 0 0 4 2 1 6 3 2009
           711.8081 0 0 4 2 2 2 3 2012
          17833.396 0 0 4 2 1 6 2 2010
            11355.7 0 0 4 2 2 1 3 2013
          13488.903 0 0 4 2 2 2 2 2011
          21185.283 0 0 4 2 1 2 3 2010
           45398.45 0 0 4 2 2 2 2 2013
           15764.18 0 0 4 2 2 5 3 2011
           18724.68 1 0 1 2 2 2 3 2010
           7268.485 0 0 4 2 2 2 3 2013
           8889.855 0 0 4 2 2 2 3 2013
          19704.146 0 0 4 2 2 2 3 2010
           18415.58 0 0 4 2 2 2 3 2011
           23694.56 0 0 4 2 2 6 2 2011
          2915.6025 1 0 1 2 2 2 3 2010
           7419.436 0 0 4 2 2 1 3 2010
          13939.187 0 0 4 2 2 1 3 2011
          175512.55 0 0 4 2 2 5 3 2013
           27993.58 0 0 4 2 2 2 2 2011
          23301.676 0 1 2 2 2 1 3 2010
          end
          Code:
          global chart i.comorbidity1 i.comorbidity2 i.age_reg i.region1 i.year
          capture program drop bootc
          program define bootc,rclass
          
          sum condition1
          gen p1=r(mean)
          sum condition2
          gen p2=r(mean)
          
          glm tot ib4.condition $chart, family(gamma) link(log) le(95)
          
          gen c1=_b[1.condition]
          gen c2=_b[2.condition]
          gen t1=p1*c1
          gen t2=p2*c2
          gen t3=t1+t2
          
          sum p1 p2 c1 c2 t2 t2
          sum t3
          return scalar m_t3=r(mean)
          drop p1 p2 c1 c2 t1 t2 t3
          end
          
          bootstrap "bootc" r(m_t3) , reps(4)  dots noesample
          Thanks!

          Rui

          Comment


          • #6
            This thread began with you posting some code that was not working for you. I asked you to respond with data (using -dataex-) so that I could try to run the code to figure out where it is going wrong and how to fix it.

            Now, with the data and code in #5 everything seems to be fine: it runs without error messages. And I don't see anything obviously wrong with the results it produces. Is there anything I can do that would be helpful at this point?

            Comment


            • #7
              Hi Clyde,

              If you don't see anything obviously wrong with the results, that is good news! I was not sure whether I was on the right track.

              Thank you very much for your help! I really appreciate your time and effort to try to help everybody on this forum!

              Rui

              Comment

              Working...
              X