Wishlist for Stata 16

Belinda Foster

Join Date: Jul 2016

Posts: 132
#1

Wishlist for Stata 16

06 Jun 2017, 08:59

As per Nick Cox's "request"!
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5012
#2

06 Jun 2017, 09:54

Start updating web pages an hour earlier when Stata 16 is released so as not to prolong the suspense.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
2 likes
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#3

06 Jun 2017, 13:03

here are some issues especially related to the design aspect of studies:

randomization plans including adaptive randomization (yes, I am familiar with the user-written routines -ralloc-, -rct_minim- and -randomize-

programs for various kinds of matching (introductions to various forms can be found in (a) Rosenbaum, PR (2010), Design of observational Studies, Springer or (b) Stuart, E.A. (2010). Matching Methods for Causal Inference: A review and a look forward. Statistical Science 25(1): 1-21. forgot to mention that I do recognize that there are user-written routines (e.g., I have used -vmatch- several times) but these are quite limited

re: MI, I was told by StataCorp tech support that I could not do this; I then learned (Ian White), that -ice- can handle the following situation: I have a categorical variable with dozens of categories (say, numbered 1-99); there are two types of missing values: (a) standard missing and (b) "60" is a special category that means it is supposed to be one of 61-69 but we don't know which it is; I want to be able to impute both at the same time

for power analysis: I would like to compute power for a certain width of a confidence interval (i.e., I want the power for an "accurate" estimate of, say, an important coefficient)

Last edited by Rich Goldstein; 06 Jun 2017, 13:43.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5012
#4

06 Jun 2017, 21:04

Maybe it is in there somewhere in the new docs, but I am not seeing anything about faster execution. I don't know how it does it, but Mplus is often much faster that Stata sem (for that matter Mplus may be by far the fastest of all SEM programs) Stata has gotten better but if I could get the speed of Mplus and the ease of use and integration of features that Stata has, I would be very happy.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
3 likes
Comment
Cynthia Inglesias

Join Date: Aug 2015

Posts: 86
#5

07 Jun 2017, 03:32

In terms of methods Stata 15 is remarkable indeed. But Mata only got a handful of functions, while no effort at all was made to document the graphics language. With the latter, StataCorp can begin small (e.g. one or two chapters in every release). I really think that for power users this would be a great addition. The built-in editor got some love but still no auto save function or syntax auto-complete.

I also find it strange that StataCorp ignores people's advice for speeding up a number of built in commands. Sure, increasing the number of variables in Stata MP is helpful but it would be a lot more useful in my opinion to speed up data management without necessarily using parallelization. Collectively, the time required to run certain commands accumulates fast with big data.

Stata is a great piece of software and with relatively few changes it could become a lot better. Some of the decisions StataCorp makes are truly puzzling. I hope Stata 16 will be the version where i can do 99% of my work in a single environment.

Last edited by Cynthia Inglesias; 07 Jun 2017, 03:35.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5012
#6

07 Jun 2017, 08:01

I also find it strange that StataCorp ignores people's advice for speeding up a number of built in commands.

In fairness, Stata 14.2 did make some nice speed improvements in -sem-, especially when fiml is used. As I said before, Mplus is amazingly fast, but I think it is faster than everyone. It would be helpful to know if other packages are faster than Stata. If I am not running something too monstrous, I find Stata quite zippy, seemingly faster than SPSS (but I haven't used SPSS in years).

In short, I would love to see speed improvements in Stata, but users of other packages may want speed improvements too.

Incidentally, my ongoing hope is that Stata buys out Mplus and steals all their algoritms! Or copies whatever is the Mplus secret of success.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Isaac Maddow-Zimet

Join Date: Apr 2014

Posts: 70
#7

07 Jun 2017, 09:53

Richard Williams I've had similar experiences with sem (and especially gsem) being very slow compared to Mplus. I've used R's lavaan package in the past, and I've found it much more comparable, speedwise, to Mplus than to Stata for fitting structural equation models. But it would be great to have a speedy implementation in Stata as well.
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1139

23 Aug 2017, 07:18

I would like to see an option for the N-1 Chi-square added to tabulate. As Ian Campbell's simulation study has shown, it is a better choice than the Fisher-Irwin test (aka., Fisher's exact test) for analyzing 2x2 tables when the marginal totals are not fixed in advance by design.

Of course, it is possible to to calculate the N-1 Chi-square oneself by various means (see below). But I think it would be better to have a built in option.

Cheers,
Bruce

Code:

. * Three ways to compute the N-1 Chi-square with Stata.
.
. * First, generate a data set containing the 4 cell counts.
. * I'll use the well-known 2x2 table showing the relationship
. * between type of feeding (breast vs bottle) and malocclusion
. * of the teeth in infants (see Yates, 1934; Kendall & Stuart,
. * 1967; Campbell, 2007).
.
. clear all

. input rowvar colvar N

        rowvar     colvar          N
  1. 0 0 4
  2. 0 1 16
  3. 1 0 1
  4. 1 1 21
  5. end

. list

     +----------------------+
     | rowvar   colvar    N |
     |----------------------|
  1. |      0        0    4 |
  2. |      0        1   16 |
  3. |      1        0    1 |
  4. |      1        1   21 |
     +----------------------+

.
. * METHOD 1.
.
. * Write a small program to compute E.S. Pearson's N-1 Chi-square test
. * using stored results from the 'tabulate' command.
. * Program name:  ESPChiSq, short for Egon S. Pearson's N-1 Chi-Square.
. capture program drop ESPChiSq

. quietly program ESPChiSq

.
. * Use tabulate command to compute Pearson's Chi-square.
. tabulate rowvar colvar [fweight = N], chi2

           |        colvar
    rowvar |         0          1 |     Total
-----------+----------------------+----------
         0 |         4         16 |        20
         1 |         1         21 |        22
-----------+----------------------+----------
     Total |         5         37 |        42

          Pearson chi2(1) =   2.3858   Pr = 0.122

. ESPChiSq
Egon S. Pearson's N-1 Chi-Square Test
N-1 ChiSq   df     p-value
----------------------------
2.3290418    1    .12698002
----------------------------

.
. * Could also be done using tabi (i.e., immediate form of tabulate).
.
. tabi 4 16 \ 1 21, chi2

           |          col
       row |         1          2 |     Total
-----------+----------------------+----------
         1 |         4         16 |        20
         2 |         1         21 |        22
-----------+----------------------+----------
     Total |         5         37 |        42

          Pearson chi2(1) =   2.3858   Pr = 0.122

. ESPChiSq
Egon S. Pearson's N-1 Chi-Square Test
N-1 ChiSq   df     p-value
----------------------------
2.3290418    1    .12698002
----------------------------

.
. * ----------------------------------------------
.
. * METHOD 2.
.
. * Compute a constant stratum variable
. generate Stratum = 0   

. list

     +--------------------------------+
     | rowvar   colvar    N   Stratum |
     |--------------------------------|
  1. |      0        0    4         0 |
  2. |      0        1   16         0 |
  3. |      1        0    1         0 |
  4. |      1        1   21         0 |
     +--------------------------------+

.
. * Use tab3way to display the contingency table.
. tab3way rowvar colvar Stratum [fweight=N] , rowtot coltot


Frequency weights are based on the expression: N
Table entries are cell frequencies
Missing categories ignored

-------------------------------
          | Stratum and colvar
          | -------- 0 --------
   rowvar |     0      1  TOTAL
----------+--------------------
        0 |     4     16     20
        1 |     1     21     22
    TOTAL |     5     37     42
-------------------------------

. * Use the cc command to compute the Mantel-Haenszel statistic & p-value.
. cc rowvar colvar [fweight=N], by(Stratum)

         Stratum |       OR       [95% Conf. Interval]   M-H Weight
-----------------+-------------------------------------------------
               0 |       5.25      .4440375    270.558     .3809524 (exact)
-----------------+-------------------------------------------------
           Crude |       5.25      .4440375    270.558              (exact)
    M-H combined |       5.25      .5338913   51.62568              
-------------------------------------------------------------------

                   Test that combined OR = 1:
                                Mantel-Haenszel chi2(1) =      2.33
                                                Pr>chi2 =    0.1270

.
. * The M-H test above is matching the Linear-by-linear association test
. * from SPSS.
.
. * Now see what happens if the stratification variable is omitted.
. cc row col [fweight=N]
                                                         Proportion
                 |   Exposed   Unexposed  |      Total     Exposed
-----------------+------------------------+------------------------
           Cases |        21           1  |         22       0.9545
        Controls |        16           4  |         20       0.8000
-----------------+------------------------+------------------------
           Total |        37           5  |         42       0.8810
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
      Odds ratio |             5.25       |    .4440375     270.558 (exact)
 Attr. frac. ex. |         .8095238       |   -1.252062    .9963039 (exact)
 Attr. frac. pop |         .7727273       |
                 +-------------------------------------------------
                               chi2(1) =     2.39  Pr>chi2 = 0.1224

. * If I omit the constant Stratum variable, Pearson's Chi-square is computed.
.
. * ----------------------------------------------
.
. * METHOD 3.
.
. * As Howell's notes below show, Mantel's Chi-square for linear trend
. * (aka., the test of linear-by-linear association in SPSS) is equal
. * to Pearson's r-squared * (N-1).
. * https://www.uvm.edu/~dhowell/methods7/Supplements/OrdinalChiSq.html
.
. quietly correlate rowvar colvar [fweight = N]

. * return list
. local Linear = (r(N)-1)*r(rho)^2

. local dfLinear = 1

. display "N-1 Chi-square = " `Linear'
N-1 Chi-square = 2.3290418

. display "             p = " chi2tail(1,`Linear')
             p = .12698002

. * ----------------------------------------------

Here is the ESPChiSq program code that did not appear in that output due to the use of quietly.

Code:

capture program drop ESPChiSq
quietly program ESPChiSq
 display "Egon S. Pearson's N-1 Chi-Square Test"
 display "N-1 ChiSq   df     p-value"
 display "----------------------------"
 display (r(N)-1)/r(N)*r(chi2) "    " (r(r)-1)*(r(c)-1) ///
  "    " chi2tail(1,(r(N)-1)/r(N)*r(chi2))
 display "----------------------------"
end

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Richard Williams

Join Date: Apr 2014

Posts: 5012
#9

23 Aug 2017, 07:24

Ability to directly read in more data files in other formats, e.g. SPSS.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
5 likes
Comment
Nigel Moore

Join Date: Apr 2016

Posts: 79
#10

23 Aug 2017, 07:33

More options for post hoc analysis, especially with unbalanced data sets:
Dunnet's test (AFAIK, the two Dunnet's supported by Stata require balanced data sets, certainly Theresa Powell's does)

Steel's test

Dwass-Steel-Critchlow-Fligner

Shirley's test

Williams' test

Oh, and a way of running Bartlett's without -oneway-, or appending -pwcompare- to -oneway-. Currently, we have to run ANOVA twice, -oneway- for Bartlett's and -anova- for -pwcompare-. How on earth can that make sense?

Stata 14.2MP
OS X
1 like
Comment
Christoph Thewes

Join Date: Jun 2014

Posts: 33
#11

23 Aug 2017, 08:37

nothing you can't do with other commands/ados, but this would be nice and easy:
Option for .tab to display numerical value before the value-label. nolabel display numeric codes rather than value labels, but what I normally want is both!

So exactly what "numlabel VAR, add" is doing, but as an display-option for tab.
3 likes
Comment
Andrew Wade

Join Date: Aug 2017

Posts: 28
#12

24 Aug 2017, 19:24

Hi,
Having just had troubles replicating results first generated in SAS....I suggest that the egen suite of commands have the option of using weights.
This may of course not be relevant for all of the egen commands. But it was for what I was doing...using the std command.
Regards,
Andrew
Comment
Dario Maimone Ansaldo Patti

Join Date: Aug 2014

Posts: 508
#13

24 Aug 2017, 20:31

I would like to see more models to be Estimated using bayes. In particular, i would like to estimate bayesian spatial models using stata instead of being forced to use Matlab or R
Comment
Cynthia Inglesias

Join Date: Aug 2015

Posts: 86
#14

26 Aug 2017, 05:05

Richard Williams Recently, there was an interesting presentation by Sergio Correia regarding speeding up inefficient Stata commands: https://www.stata.com/meeting/baltim...17_Correia.pdf

StataCorp should finally implement these suggestions in Stata. I understand that the company has certain priorities but the latest version was a disappointment with regards to features relating to speed improvements and mata. Not everyone has access to Stata MP and datasets become increasingly bigger. Asking people to buy Stata MP for data manipulation is ridiculous.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5012
#15

26 Aug 2017, 06:36

Cynthia Inglesias, I'll add that Stata/MP doesn't seem to speed up sem, at least not the sem models I run. Indeed, when I run sem on these monster UNIX machines, tasks with enormous data sets run much faster but sem doesn't. As I understand it, the UNIX machines have enormous amounts of memory but their processors are no faster than my desktop, maybe even slower.

So yes, it would be great if Stata were faster with large data sets. But there are other programs, such as sem, which also would be nice to speed up, presumably by better algorithms. (I repeat my request that Stata buy out MPlus or else figure out how to reverse-engineer whatever it is it does to zip fast the competition -- not only Stata, but perhaps everybody else.)

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment

Announcement