Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • the smae code and the same data, but the result is different using STATA 13,15 and 16

    In Stata 16, the code below cannot run normally, but in Stata 15 and Stata 13, the code can run correctly, however, the results between them is different.
    Which result should I believe?
    Last edited by Fred Lee; 23 Jul 2019, 02:40.

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(dv id serial) float(iv med mod1 newid)
      4  1 1     .12499996    .5486109 -.05694437  1
      4  1 2    -.07500005  -.17361127 -.05694437  1
      4  1 3    -.07500005    .5486109 -.05694437  1
    3.8  1 4    -.07500005  -.06250016 -.05694437  1
      4  1 5    -.07500005    .3263887 -.05694437  1
      3  1 6    -.07500005   -.3958335 -.05694437  1
      4  1 7          .325   -.3958335 -.05694437  1
      4  1 8    -.07500005   -.3958335 -.05694437  1
    3.2  2 1         1.725   -.4375001  .14305563  2
    3.7  2 2          .725   -.3819445  .14305563  2
    3.7  2 3     -.6750001  .006944391  .14305563  2
    3.4  2 4    -.07500005    .2291666  .14305563  2
    3.3  2 5        -1.075  -.54861116  .14305563  2
      3  2 6          .325    .2291666  .14305563  2
    4.7  2 7    -.27500004    .4513888  .14305563  2
      5  2 8     -.6750001    .4513888  .14305563  2
      4  3 1         -.125 -.013888836  -.3569444  3
    3.5  3 2          .075 -.013888836  -.3569444  3
    3.8  3 3         -.125 -.013888836  -.3569444  3
    3.6  3 4         -.125   .09722228  -.3569444  3
    3.8  3 5          .075 -.013888836  -.3569444  3
      4  3 6          .075 -.013888836  -.3569444  3
    3.8  3 7          .075 -.013888836  -.3569444  3
      4  3 8          .075 -.013888836  -.3569444  3
    3.2  4 1    -.17142864 -.007936531  -.4569444  4
    3.7  4 2     -.7714286   .04761903  -.4569444  4
    2.7  4 3     .02857137  -.06349209  -.4569444  4
    2.2  4 4      .4285714  -.11904764  -.4569444  4
    2.4  4 5      .6285714   .10317458  -.4569444  4
    2.9  4 6     .22857137    .3253968  -.4569444  4
      .  4 7             .           .  -.4569444  4
    3.8  4 8     -.3714286   -.2857143  -.4569444  4
    3.9  5 1    .025000047   .58333325 -.15694436  5
    4.3  5 2    -.17499995   .02777767 -.15694436  5
    4.1  5 3    -.17499995    .4166666 -.15694436  5
    3.4  5 4    .025000047  -.25000012 -.15694436  5
    3.5  5 5          .425   -.3611112 -.15694436  5
    3.8  5 6     .22500005  -.19444455 -.15694436  5
    3.1  5 7    -.17499995  -.25000012 -.15694436  5
    4.3  5 8    -.17499995   .02777767 -.15694436  5
      4  6 1      .7000001  -.08333354  .14305563  6
    4.2  6 2      .3000001     .472222  .14305563  6
      4  6 3      .1000001  .027777566  .14305563  6
    3.6  6 4     -.2999999  -.08333354  .14305563  6
    3.3  6 5     -.2999999  -.08333354  .14305563  6
    3.8  6 6     -.2999999   -.1388891  .14305563  6
    3.3  6 7      .1000001  -.08333354  .14305563  6
    3.3  6 8     -.2999999  -.02777799  .14305563  6
    3.8  7 1     -.0999999   .06250016  .14305563  7
    3.5  7 2     -.0999999    .3402779  .14305563  7
    3.3  7 3     -.0999999   .11805572  .14305563  7
    2.8  7 4     -.0999999   -.1041665  .14305563  7
      3  7 5      .3000001   -.1041665  .14305563  7
      3  7 6      .1000001   -.1041665  .14305563  7
    3.2  7 7      .1000001   -.1041665  .14305563  7
      3  7 8     -.0999999   -.1041665  .14305563  7
      .  8 1             .           .   .4430556  8
      .  8 2             .           .   .4430556  8
    4.5  8 3    .033333253    -.111111   .4430556  8
    3.6  8 4     -.3666667    -.111111   .4430556  8
    3.4  8 5    .033333253  -.05555545   .4430556  8
    3.1  8 6    -.16666675   .22222233   .4430556  8
    3.4  8 7     .43333325   .16666678   .4430556  8
    3.6  8 8    .033333253    -.111111   .4430556  8
    4.1  9 1     .12500003    .3680556 -.05694437  9
    4.6  9 2     .12500003   .47916675 -.05694437  9
    4.5  9 3    -.27499998  -2.1319444 -.05694437  9
      4  9 4     .12500003   .14583342 -.05694437  9
    4.5  9 5     .12500003   .14583342 -.05694437  9
      4  9 6    -.27499998    .3680556 -.05694437  9
    4.5  9 7     .12500003   .25694454 -.05694437  9
    4.5  9 8    -.07499997    .3680556 -.05694437  9
    3.4 10 1            .2           0   .8430556 10
    3.5 10 2            .2           0   .8430556 10
    3.7 10 3            .2           0   .8430556 10
      3 10 4             0           0   .8430556 10
    3.3 10 5            .2           0   .8430556 10
    3.6 10 6           -.8           0   .8430556 10
      3 10 7             0           0   .8430556 10
      3 10 8             0           0   .8430556 10
    3.8 11 1            .8           0  .24305563 11
    3.8 11 2    -.20000005           0  .24305563 11
    3.4 11 3    -.20000005           0  .24305563 11
    3.8 11 4    -.20000005           0  .24305563 11
    3.6 11 5 -4.768372e-08           0  .24305563 11
      4 11 6    -.20000005           0  .24305563 11
    3.6 11 7 -4.768372e-08           0  .24305563 11
    3.4 11 8 -4.768372e-08           0  .24305563 11
    4.1 12 1          -.75           0 -.05694437 12
    3.3 12 2          -.55           0 -.05694437 12
    3.8 12 3           .25           0 -.05694437 12
    3.9 12 4          -.75           0 -.05694437 12
    3.9 12 5          -.35           0 -.05694437 12
    3.6 12 6           .65           0 -.05694437 12
      3 12 7           .25           0 -.05694437 12
    4.3 12 8          1.25           0 -.05694437 12
      3 13 1             0 -.009259198  -.6569444 13
      3 13 2             0 -.009259198  -.6569444 13
      3 13 3             0 -.064814754  -.6569444 13
      3 13 4             0 -.009259198  -.6569444 13
    end
    xtset id serial
    set seed 10000
    capture program drop MoDMed
    program define MoDMed, rclass
     quietly summarize mod1
        return list
        global m=r(mean)
        global sd=r(sd)
    mixed med  l.iv mod1 cl.iv#c.mod1 || id: l.iv, var cov(exc) iter(50)
        return scalar al=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m-$sd))
        return scalar ah=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m+$sd))
        
    mixed dv  med l.iv   || id: l.iv med, var  cov(exc) iter(50)
        return scalar b=(_b[dv:med])
    
    end
    
    bootstrap indL=(r(al)*r(b)) indH=(r(ah)*r(b)) diff=(r(ah)*r(b)-r(al)*r(b)), reps(50) reject(e(converged)!=1) cluster(id) idcluster(newid) group(serial): MoDMed
    estat bootstrap
    program drop MoDMed

    Comment


    • #3
      There are a couple of issues here. First, if I run your entire code block in Stata 16, it runs perfectly fine. This is also the same set of results that I get when running on Stata 15.1. The results are shown below.

      Code:
      Bootstrap replications (50)
      ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
      ..................................................    50
      
      Bootstrap results                               Number of obs     =         83
                                                      Replications      =         50
      
            command:  MoDMed
               indL:  r(al)*r(b)
               indH:  r(ah)*r(b)
               diff:  r(ah)*r(b)-r(al)*r(b)
      
                                           (Replications based on 13 clusters in id)
      ------------------------------------------------------------------------------
                   |   Observed   Bootstrap                         Normal-based
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
              indL |  -.0221546   .1279645    -0.17   0.863    -.2729605    .2286513
              indH |  -.0682389   .0730797    -0.93   0.350    -.2114725    .0749947
              diff |  -.0460843   .1628654    -0.28   0.777    -.3652946     .273126
      ------------------------------------------------------------------------------
      
      . estat bootstrap
      
      Bootstrap results                               Number of obs     =         83
                                                      Replications      =         50
      
            command:  MoDMed
               indL:  r(al)*r(b)
               indH:  r(ah)*r(b)
               diff:  r(ah)*r(b)-r(al)*r(b)
      
                                           (Replications based on 13 clusters in id)
      ------------------------------------------------------------------------------
                   |    Observed               Bootstrap
                   |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
      -------------+----------------------------------------------------------------
              indL |  -.02215458  -.0104108   .12796453   -.5778048   .1074098  (BC)
              indH |   -.0682389  -.0272837   .07307971   -.2348322  -.0035386  (BC)
              diff |  -.04608432  -.0168729   .16286538   -.1381491   .5742661  (BC)
      ------------------------------------------------------------------------------
      (BC)   bias-corrected confidence interval
      Since you are trying the code under different versions of Stata, it is important to learn about the built-in version control (see the output of -help version-). By setting the version number at the top of your do file, you ensure that Stata will use the correct version of the interpreter when executing commands. In some cases, these version statements will have little impact because a program may have been written with a very old version of Stata in mind, but simply ignoring it doesn't ensure backwards compatibility.

      One area where this is noticeably impactful is the transition from Stata 13 to 14. The default random number generator changed to the Marsenne Twister in Stata 14 (see this Stata blog entry for an example), which changed from the 32-bit KISS algorithm in Stata 13 and earlier. This is also discussed in the output of -help set_rng-.

      What should you do? If you want your results to be compatible with older versions of Stata, use a -version 13- statement at the top of your code. (If you do this, you indeed get different results from above.) However, it's more likely that you would want your code to be run with at least Stata 14, so just change the number accordingly. Here is how you would do that.

      Code:
      clear
      version 15.1   //sets the version of the interpreter to use for the entirety of the do file
      ...
      xtset id serial
      set seed 10000
      capture program drop MoDMed
      program define MoDMed, rclass
       quietly summarize mod1
          return list
          global m=r(mean)
          global sd=r(sd)
      mixed med  l.iv mod1 cl.iv#c.mod1 || id: l.iv, var cov(exc) iter(50)
          return scalar al=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m-$sd))
          return scalar ah=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m+$sd))
          
      mixed dv  med l.iv   || id: l.iv med, var  cov(exc) iter(50)
          return scalar b=(_b[dv:med])
      
      end
      
      bootstrap indL=(r(al)*r(b)) indH=(r(ah)*r(b)) diff=(r(ah)*r(b)-r(al)*r(b)), reps(50) reject(e(converged)!=1) cluster(id) idcluster(newid) group(serial): MoDMed
      estat bootstrap
      program drop MoDMed
      Lastly, note the first two remarks about version control:
      1) All programs and do-files written for the current version of Stata should include version <your current version> as the first executable statement.
      2) Programs and do-files written for earlier releases should include the appropriate version line at the top.

      Comment


      • #4
        Thanks Leonardo Guizzetti!
        In Stata 15.1, I got the same result as you. However, in Stata 16, the Stata didn't run perfectly fine. It says as below:
        Click image for larger version

Name:	1.png
Views:	1
Size:	12.2 KB
ID:	1509058

        Comment


        • #5
          Originally posted by Leonardo Guizzetti View Post
          There are a couple of issues here. First, if I run your entire code block in Stata 16, it runs perfectly fine.
          Why there exsit some problem in my Stata 16?

          Comment


          • #6
            For what it's worth, if I run the same block of code in Stata 16 (latest update), I have the same error (see below). However, it works in Stata 15.1 and Stata 16 under version control (version 15), giving apparently the same results. In case it's significant, I'm on Windows.

            Code:
            (running MoDMed on estimation sample)
            
            Bootstrap replications (50)
            ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    50
            insufficient observations to compute bootstrap standard errors
            no results will be saved
            r(2000);
            Last edited by Jean-Claude Arbaut; 23 Jul 2019, 09:05.

            Comment


            • #7
              I retract my comment about it working on Stata 16, I was accidentally using version control. I now get the same error as you both.

              Here's the issue that I can see...bootstrap in version 16 is not sorting the resulting data set correctly. When running this in Stata 16 version controlled to Stata 15.1, each bootstrap resampled dataset is sorted by -id- and -serial-, whereas in Stata 16, it is sorted by -newid-. So it seems like bootstrap forgets about the -group()- option?

              If I change your program to the following, it will now work in Stata 16.

              Code:
              capture program drop MoDMed
              program define MoDMed, rclass
                sort id serial
               quietly summarize mod1
                  return list
                  global m=r(mean)
                  global sd=r(sd)
              mixed med  l.iv mod1 cl.iv#c.mod1 || id: l.iv, var cov(exc) iter(50)
                  return scalar al=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m-$sd))
                  return scalar ah=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m+$sd))
                  
              mixed dv  med l.iv   || id: l.iv med, var  cov(exc) iter(50)
                  return scalar b=(_b[dv:med])
              
              end
              
              bootstrap indL=(r(al)*r(b)) indH=(r(ah)*r(b)) diff=(r(ah)*r(b)-r(al)*r(b)), reps(50) reject(e(converged)!=1) cluster(id) idcluster(newid) group(serial): MoDMed
              estat bootstrap
              program drop MoDMed

              Comment


              • #8
                Also, as an aside, it's generally a good practice to store scalar results as local (rather than global) in your programs. The following change could be made in the relevant lines above.

                Code:
                scalar m=r(mean)
                scalar sd=r(sd)
                return scalar al=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*(m-sd))
                return scalar ah=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*(m+sd))

                Comment


                • #9
                  Thanks a lot, Leonardo Guizzetti.

                  Comment


                  • #10
                    Originally posted by Leonardo Guizzetti View Post
                    So it seems like bootstrap forgets about the -group()- option?
                    Maybe you can report this issue to the developers.

                    Comment


                    • #11
                      You're welcome, Fred. I have reported it and will report back what they say.

                      Comment


                      • #12
                        Hi everybody,

                        The problem as described in this thread is not a bug.
                        Rather, it is a documentation oversight on our end. We had to make some
                        changes to bootstrap in the context of newly implemented features in
                        Stata 16, and now if you have a self-written command that is supposed to
                        work with the bootstrap prefix, and which utilizes panel-data or
                        multilevel models that also use time-series operators, there will now
                        have to be a certain program property associated with the self-written
                        program. We will add this to the following help page and corresponding
                        manual entry:
                        Code:
                        help program properties
                        The name of the property is xtbs, so if you would define something like
                        Code:
                        program define mybs
                            mixed y L.x || id:
                        end
                        Then this program would now have to look like:
                        Code:
                        program define mybs, properties(xtbs)
                            mixed y L.x || id:
                        end
                        In addition to that, we decided to add an option to bootstrap that
                        will enforce checking of xt settings, and if that option is used, there
                        will not have to be an associated program property.

                        Now, with that being said, while looking into this issue we actually
                        discovered a bug in bootstrap that materializes in the case when a
                        self-written program uses a command under the hood that relies on
                        xtsettings, as is the case if you have panel/multilevel data and also
                        use time-series operators. The bug is basically the same as the one we
                        fixed as of the update from 20 Sep 2018 for our official panel
                        data/multilevel data estimation commands that allow specification of
                        time-series operators. If you have a look at
                        Code:
                        help whatsnew15
                        it is described under update 20sep2018, item #2. We will provide a fix
                        in a future update that will also capture the case of self-written
                        programs that use both panel-data estimators and time-series operators.
                        At this point, however, it is not possible to run this type of bootstrap
                        correctly using a self-written program along with the bootstrap
                        prefix, neither in Stata 16 nor in Stata 15. The only way to perform
                        this kind of bootstrap at the moment is to roll your own outside of the
                        bootstrap prefix command.

                        Before I will show an example for that in just a moment, I would
                        like to point out that the way that bootstrap is specified in
                        the above example is most likely not leading to the intended results. That is,
                        if you perform a clustered bootstrap and want to have the clustering
                        units re-uniqueified for each of the bootstrap replication samples, then
                        if these were your xtsettings
                        Code:
                        xtset id t
                        instead of doing
                        Code:
                        bootstrap ... , ... cluster(id) idcluster(newid) group(t)
                        you would need to specify
                        Code:
                        bootstrap ... , ... cluster(id) idcluster(newid) group(id)
                        Otherwise, you would create new clusters that are combinations of the
                        non-uniqueified clustering units and the time points, which is probably
                        not what you want here.

                        Now with that out of the way, here is an example where we bootstrap the
                        standard errors of some parameters from a linear mixed model using a
                        self-written program that utilizes the bsample command, and we use
                        simulate to bootstrap the standard errors:

                        Code:
                        * Toy data:
                        set seed 1234
                        set obs 50 
                        gen id = _n
                        gen u  = rnormal(0.0,0.5)
                        gen v  = rnormal(0.0,0.2)
                        expand 10
                        bys id : gen t = _n
                        gen x1 = runiform()
                        gen e  = rnormal()
                        tsset id t
                        gen y  = 5 + L.x1 + L.x1*v + u + e
                        drop v u e
                        
                        * Program for bootstrapping:
                        program define mybs, rclass
                            preserve
                            bsample, cluster(id) idcluster(newid)
                            sort newid id, stable
                            qui by newid id: gen newgroup = _n == 1
                            qui replace newgroup = sum(newgroup)
                            drop id
                            rename newgroup id
                            tsset id t
                            mixed y L.x1 || id: L.x1
                            return scalar b1 = _b[L.x1]
                            return scalar b0 = _b[_cons]
                            return scalar var_i = exp(_b[lns1_1_2:_cons])^2
                            return scalar var_c = exp(_b[lns1_1_1:_cons])^2
                            return scalar var_e = exp(_b[lnsig_e:_cons])^2
                        end
                        
                        * Point estimates:
                        mixed y L.x1 || id: L.x1
                        
                        * Bootstrapping standard errors:
                        simulate b1=r(b1) b0=r(b0) var_c=r(var_c) var_i=r(var_i) var_e=r(var_e), ///
                                 reps(10) seed(123) : mybs
                        
                        * Show results:
                        foreach v of varlist _all {
                            qui sum `v'
                            di as txt "SE `v'" _col(11) "= " as res `r(sd)'
                        }
                        For the sake of example, I only specified 10 replications here.
                        Obviously, this number should be much higher for an actual analysis.

                        We apologize for the inconvenience, and as I said, we will provide a
                        fix for this in an upcoming update.

                        Best,
                        Joerg

                        Comment


                        • #13
                          Hi everybody,

                          FYI, the problems described in this thread concerning Stata 15 and 16 are now fixed as of the Stata 16 update from 23 August 2019 and the Stata 15 update from 26 August 2019. I mentioned in my above post that an additional program property is needed with user-defined commands that make use of time-series operators, and that we would add an option to bootstrap that checks for xt settings. However, we found a better way to deal with these issues and no additional program property or bootstrap option is needed. To replicate my manual bootstrap from above, we can now simply do it the following way:
                          Code:
                          * Toy data:
                          clear
                          set seed 1234
                          set obs 50 
                          gen id = _n
                          gen u  = rnormal(0.0,0.5)
                          gen v  = rnormal(0.0,0.2)
                          expand 10
                          bys id : gen t = _n
                          gen x1 = runiform()
                          gen e  = rnormal()
                          tsset id t
                          gen y  = 5 + L.x1 + L.x1*v + u + e
                          drop v u e
                          
                          * Program for -bootstrap-:
                          program define mybs2, rclass
                              mixed y L.x1 || id: L.x1
                              return scalar b1    = _b[L.x1]
                              return scalar b0    = _b[_cons]
                              return scalar var_i = exp(_b[lns1_1_2:_cons])^2
                              return scalar var_c = exp(_b[lns1_1_1:_cons])^2
                              return scalar var_e = exp(_b[lnsig_e:_cons])^2
                          end
                          
                          * Run -bootstrap-:
                          bootstrap b1=r(b1) b0=r(b0) var_c=r(var_c) var_i=r(var_i) var_e=r(var_e), ///
                                    reps(10) seed(123) cluster(id) idcluster(newid) group(id) : mybs2
                          Best,
                          Joerg

                          Comment

                          Working...
                          X