the smae code and the same data, but the result is different using STATA 13,15 and 16

Fred Lee

Join Date: Nov 2017

Posts: 473
#1

the smae code and the same data, but the result is different using STATA 13,15 and 16

23 Jul 2019, 02:33

In Stata 16, the code below cannot run normally, but in Stata 15 and Stata 13, the code can run correctly, however, the results between them is different.
Which result should I believe?

Last edited by Fred Lee; 23 Jul 2019, 02:40.
Tags: None

Fred Lee

Join Date: Nov 2017
Posts: 473

23 Jul 2019, 02:39

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(dv id serial) float(iv med mod1 newid)
  4  1 1     .12499996    .5486109 -.05694437  1
  4  1 2    -.07500005  -.17361127 -.05694437  1
  4  1 3    -.07500005    .5486109 -.05694437  1
3.8  1 4    -.07500005  -.06250016 -.05694437  1
  4  1 5    -.07500005    .3263887 -.05694437  1
  3  1 6    -.07500005   -.3958335 -.05694437  1
  4  1 7          .325   -.3958335 -.05694437  1
  4  1 8    -.07500005   -.3958335 -.05694437  1
3.2  2 1         1.725   -.4375001  .14305563  2
3.7  2 2          .725   -.3819445  .14305563  2
3.7  2 3     -.6750001  .006944391  .14305563  2
3.4  2 4    -.07500005    .2291666  .14305563  2
3.3  2 5        -1.075  -.54861116  .14305563  2
  3  2 6          .325    .2291666  .14305563  2
4.7  2 7    -.27500004    .4513888  .14305563  2
  5  2 8     -.6750001    .4513888  .14305563  2
  4  3 1         -.125 -.013888836  -.3569444  3
3.5  3 2          .075 -.013888836  -.3569444  3
3.8  3 3         -.125 -.013888836  -.3569444  3
3.6  3 4         -.125   .09722228  -.3569444  3
3.8  3 5          .075 -.013888836  -.3569444  3
  4  3 6          .075 -.013888836  -.3569444  3
3.8  3 7          .075 -.013888836  -.3569444  3
  4  3 8          .075 -.013888836  -.3569444  3
3.2  4 1    -.17142864 -.007936531  -.4569444  4
3.7  4 2     -.7714286   .04761903  -.4569444  4
2.7  4 3     .02857137  -.06349209  -.4569444  4
2.2  4 4      .4285714  -.11904764  -.4569444  4
2.4  4 5      .6285714   .10317458  -.4569444  4
2.9  4 6     .22857137    .3253968  -.4569444  4
  .  4 7             .           .  -.4569444  4
3.8  4 8     -.3714286   -.2857143  -.4569444  4
3.9  5 1    .025000047   .58333325 -.15694436  5
4.3  5 2    -.17499995   .02777767 -.15694436  5
4.1  5 3    -.17499995    .4166666 -.15694436  5
3.4  5 4    .025000047  -.25000012 -.15694436  5
3.5  5 5          .425   -.3611112 -.15694436  5
3.8  5 6     .22500005  -.19444455 -.15694436  5
3.1  5 7    -.17499995  -.25000012 -.15694436  5
4.3  5 8    -.17499995   .02777767 -.15694436  5
  4  6 1      .7000001  -.08333354  .14305563  6
4.2  6 2      .3000001     .472222  .14305563  6
  4  6 3      .1000001  .027777566  .14305563  6
3.6  6 4     -.2999999  -.08333354  .14305563  6
3.3  6 5     -.2999999  -.08333354  .14305563  6
3.8  6 6     -.2999999   -.1388891  .14305563  6
3.3  6 7      .1000001  -.08333354  .14305563  6
3.3  6 8     -.2999999  -.02777799  .14305563  6
3.8  7 1     -.0999999   .06250016  .14305563  7
3.5  7 2     -.0999999    .3402779  .14305563  7
3.3  7 3     -.0999999   .11805572  .14305563  7
2.8  7 4     -.0999999   -.1041665  .14305563  7
  3  7 5      .3000001   -.1041665  .14305563  7
  3  7 6      .1000001   -.1041665  .14305563  7
3.2  7 7      .1000001   -.1041665  .14305563  7
  3  7 8     -.0999999   -.1041665  .14305563  7
  .  8 1             .           .   .4430556  8
  .  8 2             .           .   .4430556  8
4.5  8 3    .033333253    -.111111   .4430556  8
3.6  8 4     -.3666667    -.111111   .4430556  8
3.4  8 5    .033333253  -.05555545   .4430556  8
3.1  8 6    -.16666675   .22222233   .4430556  8
3.4  8 7     .43333325   .16666678   .4430556  8
3.6  8 8    .033333253    -.111111   .4430556  8
4.1  9 1     .12500003    .3680556 -.05694437  9
4.6  9 2     .12500003   .47916675 -.05694437  9
4.5  9 3    -.27499998  -2.1319444 -.05694437  9
  4  9 4     .12500003   .14583342 -.05694437  9
4.5  9 5     .12500003   .14583342 -.05694437  9
  4  9 6    -.27499998    .3680556 -.05694437  9
4.5  9 7     .12500003   .25694454 -.05694437  9
4.5  9 8    -.07499997    .3680556 -.05694437  9
3.4 10 1            .2           0   .8430556 10
3.5 10 2            .2           0   .8430556 10
3.7 10 3            .2           0   .8430556 10
  3 10 4             0           0   .8430556 10
3.3 10 5            .2           0   .8430556 10
3.6 10 6           -.8           0   .8430556 10
  3 10 7             0           0   .8430556 10
  3 10 8             0           0   .8430556 10
3.8 11 1            .8           0  .24305563 11
3.8 11 2    -.20000005           0  .24305563 11
3.4 11 3    -.20000005           0  .24305563 11
3.8 11 4    -.20000005           0  .24305563 11
3.6 11 5 -4.768372e-08           0  .24305563 11
  4 11 6    -.20000005           0  .24305563 11
3.6 11 7 -4.768372e-08           0  .24305563 11
3.4 11 8 -4.768372e-08           0  .24305563 11
4.1 12 1          -.75           0 -.05694437 12
3.3 12 2          -.55           0 -.05694437 12
3.8 12 3           .25           0 -.05694437 12
3.9 12 4          -.75           0 -.05694437 12
3.9 12 5          -.35           0 -.05694437 12
3.6 12 6           .65           0 -.05694437 12
  3 12 7           .25           0 -.05694437 12
4.3 12 8          1.25           0 -.05694437 12
  3 13 1             0 -.009259198  -.6569444 13
  3 13 2             0 -.009259198  -.6569444 13
  3 13 3             0 -.064814754  -.6569444 13
  3 13 4             0 -.009259198  -.6569444 13
end
xtset id serial
set seed 10000
capture program drop MoDMed
program define MoDMed, rclass
 quietly summarize mod1
    return list
    global m=r(mean)
    global sd=r(sd)
mixed med  l.iv mod1 cl.iv#c.mod1 || id: l.iv, var cov(exc) iter(50)
    return scalar al=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m-$sd))
    return scalar ah=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m+$sd))
    
mixed dv  med l.iv   || id: l.iv med, var  cov(exc) iter(50)
    return scalar b=(_b[dv:med])

end

bootstrap indL=(r(al)*r(b)) indH=(r(ah)*r(b)) diff=(r(ah)*r(b)-r(al)*r(b)), reps(50) reject(e(converged)!=1) cluster(id) idcluster(newid) group(serial): MoDMed
estat bootstrap
program drop MoDMed

Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2406

23 Jul 2019, 07:21

There are a couple of issues here. First, if I run your entire code block in Stata 16, it runs perfectly fine. This is also the same set of results that I get when running on Stata 15.1. The results are shown below.

Code:

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50

Bootstrap results                               Number of obs     =         83
                                                Replications      =         50

      command:  MoDMed
         indL:  r(al)*r(b)
         indH:  r(ah)*r(b)
         diff:  r(ah)*r(b)-r(al)*r(b)

                                     (Replications based on 13 clusters in id)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        indL |  -.0221546   .1279645    -0.17   0.863    -.2729605    .2286513
        indH |  -.0682389   .0730797    -0.93   0.350    -.2114725    .0749947
        diff |  -.0460843   .1628654    -0.28   0.777    -.3652946     .273126
------------------------------------------------------------------------------

. estat bootstrap

Bootstrap results                               Number of obs     =         83
                                                Replications      =         50

      command:  MoDMed
         indL:  r(al)*r(b)
         indH:  r(ah)*r(b)
         diff:  r(ah)*r(b)-r(al)*r(b)

                                     (Replications based on 13 clusters in id)
------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
        indL |  -.02215458  -.0104108   .12796453   -.5778048   .1074098  (BC)
        indH |   -.0682389  -.0272837   .07307971   -.2348322  -.0035386  (BC)
        diff |  -.04608432  -.0168729   .16286538   -.1381491   .5742661  (BC)
------------------------------------------------------------------------------
(BC)   bias-corrected confidence interval

Since you are trying the code under different versions of Stata, it is important to learn about the built-in version control (see the output of -help version-). By setting the version number at the top of your do file, you ensure that Stata will use the correct version of the interpreter when executing commands. In some cases, these version statements will have little impact because a program may have been written with a very old version of Stata in mind, but simply ignoring it doesn't ensure backwards compatibility.

One area where this is noticeably impactful is the transition from Stata 13 to 14. The default random number generator changed to the Marsenne Twister in Stata 14 (see this Stata blog entry for an example), which changed from the 32-bit KISS algorithm in Stata 13 and earlier. This is also discussed in the output of -help set_rng-.

What should you do? If you want your results to be compatible with older versions of Stata, use a -version 13- statement at the top of your code. (If you do this, you indeed get different results from above.) However, it's more likely that you would want your code to be run with at least Stata 14, so just change the number accordingly. Here is how you would do that.

Code:

clear
version 15.1   //sets the version of the interpreter to use for the entirety of the do file
...
xtset id serial
set seed 10000
capture program drop MoDMed
program define MoDMed, rclass
 quietly summarize mod1
    return list
    global m=r(mean)
    global sd=r(sd)
mixed med  l.iv mod1 cl.iv#c.mod1 || id: l.iv, var cov(exc) iter(50)
    return scalar al=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m-$sd))
    return scalar ah=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m+$sd))
    
mixed dv  med l.iv   || id: l.iv med, var  cov(exc) iter(50)
    return scalar b=(_b[dv:med])

end

bootstrap indL=(r(al)*r(b)) indH=(r(ah)*r(b)) diff=(r(ah)*r(b)-r(al)*r(b)), reps(50) reject(e(converged)!=1) cluster(id) idcluster(newid) group(serial): MoDMed
estat bootstrap
program drop MoDMed

Lastly, note the first two remarks about version control:
1) All programs and do-files written for the current version of Stata should include version <your current version> as the first executable statement.
2) Programs and do-files written for earlier releases should include the appropriate version line at the top.

Comment

Fred Lee

Join Date: Nov 2017

Posts: 473
#4

23 Jul 2019, 08:04

Thanks Leonardo Guizzetti!
In Stata 15.1, I got the same result as you. However, in Stata 16, the Stata didn't run perfectly fine. It says as below:
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#5

23 Jul 2019, 08:14

Originally posted by Leonardo Guizzetti View Post

There are a couple of issues here. First, if I run your entire code block in Stata 16, it runs perfectly fine.

Why there exsit some problem in my Stata 16?
Comment
Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#6

23 Jul 2019, 08:54

For what it's worth, if I run the same block of code in Stata 16 (latest update), I have the same error (see below). However, it works in Stata 15.1 and Stata 16 under version control (version 15), giving apparently the same results. In case it's significant, I'm on Windows.

Code:

(running MoDMed on estimation sample) Bootstrap replications (50) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 50 insufficient observations to compute bootstrap standard errors no results will be saved r(2000);

Last edited by Jean-Claude Arbaut; 23 Jul 2019, 09:05.
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2406

23 Jul 2019, 09:11

I retract my comment about it working on Stata 16, I was accidentally using version control. I now get the same error as you both.

Here's the issue that I can see...bootstrap in version 16 is not sorting the resulting data set correctly. When running this in Stata 16 version controlled to Stata 15.1, each bootstrap resampled dataset is sorted by -id- and -serial-, whereas in Stata 16, it is sorted by -newid-. So it seems like bootstrap forgets about the -group()- option?

If I change your program to the following, it will now work in Stata 16.

Code:

capture program drop MoDMed
program define MoDMed, rclass
  sort id serial
 quietly summarize mod1
    return list
    global m=r(mean)
    global sd=r(sd)
mixed med  l.iv mod1 cl.iv#c.mod1 || id: l.iv, var cov(exc) iter(50)
    return scalar al=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m-$sd))
    return scalar ah=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*($m+$sd))
    
mixed dv  med l.iv   || id: l.iv med, var  cov(exc) iter(50)
    return scalar b=(_b[dv:med])

end

bootstrap indL=(r(al)*r(b)) indH=(r(ah)*r(b)) diff=(r(ah)*r(b)-r(al)*r(b)), reps(50) reject(e(converged)!=1) cluster(id) idcluster(newid) group(serial): MoDMed
estat bootstrap
program drop MoDMed

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2406
#8

23 Jul 2019, 09:13

Also, as an aside, it's generally a good practice to store scalar results as local (rather than global) in your programs. The following change could be made in the relevant lines above.

Code:

scalar m=r(mean) scalar sd=r(sd) return scalar al=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*(m-sd)) return scalar ah=(_b[med:L.iv]+_b[med:cL.iv#c.mod1]*(m+sd))
1 like
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#9

23 Jul 2019, 10:48

Thanks a lot, Leonardo Guizzetti.
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#10

23 Jul 2019, 11:04

Originally posted by Leonardo Guizzetti View Post

So it seems like bootstrap forgets about the -group()- option?

Maybe you can report this issue to the developers.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2406
#11

23 Jul 2019, 11:19

You're welcome, Fred. I have reported it and will report back what they say.
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 117
#12

25 Jul 2019, 10:27

Hi everybody,

The problem as described in this thread is not a bug.
Rather, it is a documentation oversight on our end. We had to make some
changes to bootstrap in the context of newly implemented features in
Stata 16, and now if you have a self-written command that is supposed to
work with the bootstrap prefix, and which utilizes panel-data or
multilevel models that also use time-series operators, there will now
have to be a certain program property associated with the self-written
program. We will add this to the following help page and corresponding
manual entry:

Code:

help program properties

The name of the property is xtbs, so if you would define something like

Code:

program define mybs mixed y L.x || id: end

Then this program would now have to look like:

Code:

program define mybs, properties(xtbs) mixed y L.x || id: end

In addition to that, we decided to add an option to bootstrap that
will enforce checking of xt settings, and if that option is used, there
will not have to be an associated program property.

Now, with that being said, while looking into this issue we actually
discovered a bug in bootstrap that materializes in the case when a
self-written program uses a command under the hood that relies on
xtsettings, as is the case if you have panel/multilevel data and also
use time-series operators. The bug is basically the same as the one we
fixed as of the update from 20 Sep 2018 for our official panel
data/multilevel data estimation commands that allow specification of
time-series operators. If you have a look at

Code:

help whatsnew15

it is described under update 20sep2018, item #2. We will provide a fix
in a future update that will also capture the case of self-written
programs that use both panel-data estimators and time-series operators.
At this point, however, it is not possible to run this type of bootstrap
correctly using a self-written program along with the bootstrap
prefix, neither in Stata 16 nor in Stata 15. The only way to perform
this kind of bootstrap at the moment is to roll your own outside of the
bootstrap prefix command.

Before I will show an example for that in just a moment, I would
like to point out that the way that bootstrap is specified in
the above example is most likely not leading to the intended results. That is,
if you perform a clustered bootstrap and want to have the clustering
units re-uniqueified for each of the bootstrap replication samples, then
if these were your xtsettings

Code:

xtset id t

instead of doing

Code:

bootstrap ... , ... cluster(id) idcluster(newid) group(t)

you would need to specify

Code:

bootstrap ... , ... cluster(id) idcluster(newid) group(id)

Otherwise, you would create new clusters that are combinations of the
non-uniqueified clustering units and the time points, which is probably
not what you want here.

Now with that out of the way, here is an example where we bootstrap the
standard errors of some parameters from a linear mixed model using a
self-written program that utilizes the bsample command, and we use
simulate to bootstrap the standard errors:

Code:

* Toy data: set seed 1234 set obs 50 gen id = _n gen u = rnormal(0.0,0.5) gen v = rnormal(0.0,0.2) expand 10 bys id : gen t = _n gen x1 = runiform() gen e = rnormal() tsset id t gen y = 5 + L.x1 + L.x1*v + u + e drop v u e * Program for bootstrapping: program define mybs, rclass preserve bsample, cluster(id) idcluster(newid) sort newid id, stable qui by newid id: gen newgroup = _n == 1 qui replace newgroup = sum(newgroup) drop id rename newgroup id tsset id t mixed y L.x1 || id: L.x1 return scalar b1 = _b[L.x1] return scalar b0 = _b[_cons] return scalar var_i = exp(_b[lns1_1_2:_cons])^2 return scalar var_c = exp(_b[lns1_1_1:_cons])^2 return scalar var_e = exp(_b[lnsig_e:_cons])^2 end * Point estimates: mixed y L.x1 || id: L.x1 * Bootstrapping standard errors: simulate b1=r(b1) b0=r(b0) var_c=r(var_c) var_i=r(var_i) var_e=r(var_e), /// reps(10) seed(123) : mybs * Show results: foreach v of varlist _all { qui sum `v' di as txt "SE `v'" _col(11) "= " as res `r(sd)' }

For the sake of example, I only specified 10 replications here.
Obviously, this number should be much higher for an actual analysis.

We apologize for the inconvenience, and as I said, we will provide a
fix for this in an upcoming update.

Best,
Joerg
1 like
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 117
#13

28 Aug 2019, 09:34

Hi everybody,

FYI, the problems described in this thread concerning Stata 15 and 16 are now fixed as of the Stata 16 update from 23 August 2019 and the Stata 15 update from 26 August 2019. I mentioned in my above post that an additional program property is needed with user-defined commands that make use of time-series operators, and that we would add an option to bootstrap that checks for xt settings. However, we found a better way to deal with these issues and no additional program property or bootstrap option is needed. To replicate my manual bootstrap from above, we can now simply do it the following way:

Code:

* Toy data: clear set seed 1234 set obs 50 gen id = _n gen u = rnormal(0.0,0.5) gen v = rnormal(0.0,0.2) expand 10 bys id : gen t = _n gen x1 = runiform() gen e = rnormal() tsset id t gen y = 5 + L.x1 + L.x1*v + u + e drop v u e * Program for -bootstrap-: program define mybs2, rclass mixed y L.x1 || id: L.x1 return scalar b1 = _b[L.x1] return scalar b0 = _b[_cons] return scalar var_i = exp(_b[lns1_1_2:_cons])^2 return scalar var_c = exp(_b[lns1_1_1:_cons])^2 return scalar var_e = exp(_b[lnsig_e:_cons])^2 end * Run -bootstrap-: bootstrap b1=r(b1) b0=r(b0) var_c=r(var_c) var_i=r(var_i) var_e=r(var_e), /// reps(10) seed(123) cluster(id) idcluster(newid) group(id) : mybs2

Best,
Joerg
1 like
Comment

Announcement