Wish list for Stata 14

Jonathan Horowitz

Join Date: Apr 2015

Posts: 102
#211

04 Apr 2015, 09:19

Originally posted by Richard Williams View Post

I would like to see much better support for Full Information Maximum Likelihood (fiml). Some Stata routines, e.g. SEM, provide some support for fiml (which Stata calls mlmv).

I second this, but also would like to see this implemented for non-SEM routines too. I realize this probably is a much bigger challenge than implementing it for SEM, but fiml is often the best way to handle missing data (ee: http://www.statisticalhorizons.com/w...ngDataByML.pdf) and it would be great to see it become standard.

I also second/third/fourth everyone who wants the error message to reference the line in the do file.

Finally, Satorra-Bentler for -gsem- would be outstanding.
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#212

07 Apr 2015, 08:56

So Stata 14 is announced today. I think last week's Stata facebook post revealed the main new features. Notice no performance improvements or infrastructure changes for big data:

The votes are coming in! Just a reminder, go cast your vote for which of the following features you would most like to see in the next version of Stata.
59.48% Bayesian analysis
31.90% Panel and multilevel survival models
28.45% Survey for multilevel models
24.14% Endogenous treatment effects
19.83% Treatment effects for survival models
18.10% Regression models for fractional data
18.10% Markov-switching models
15.52% Power and sample size for survival analysis...
13.79% IRT (item response theory)
13.79% Unicode
08.62% Balance diagnostics for treatment effects
08.62% Satorra-Bentler for SEM
07.76% Censored Poisson model
03.45% Small-sample inference for mixed models
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#213

07 Apr 2015, 09:56

Hi Lazlo,

Was thinking the same about both the facebook post and about the infrastructure part.

I have a bit of mixed feelings about this update, although I can see the business rationale. If you want to fight for the marginal customer, the "battle" will be fought over stuff like IRT that some fields may use a lot (not economics though).

However, as an user I'm a bit underwhelmed. For me, there are two use cases for Stata. One, manipulate data. Two, run regressions. For the first case, I'm starting to use Python (or SQL, R, etc.) a lot more, as commands like reshape or collapse are extremely slow compared to what they can be. For instance, collapsing data onto a small dataset should only require two passes on the dataset, one to get the items on which we collapse, and another to compute the statistics (assuming count/mean/total). Instead, Stata is doing sorts which is O(N log N) and a lot slower. It is even less efficient memory-wise as I recall because it creates many things with doubles that I may not want as such.

For the second use, regressions, Stata still has the lead over other programs, but that lead is narrowing. Moreover, the speed advantage that it enjoys over e.g. R in commands like -regress- dissapear as we use more higher-level commands. For instance, reghdfe is 50% slower than -lfe- (it's R alternative) and there are several *easy* ways to increase it's speed, but there is no easy way to write threads in Mata, or go down to C easily, or even CUDA. Thus, I end up out of options for speedups.

I think Matthieu's benchmark is incredibly useful in noting this differences, which again, won't matter for the marginal consumers but will matter for us, as more advanced users.

Best,
Sergio

PS: I was hoping at least for two-dataset-support, as that would allow users to code a few improvements by themselves, such as a collapse replacement.
1 like
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#214

07 Apr 2015, 11:41

Sergio Correia, the stats are fascinating, and your points are spot on.

I'd add one more thing though: I think the distinction between data-wrangling and analysis is spurious, so it is small comfort that Stata is fast on the latter. It is unrealistic, impractical or even downright wasteful (esp. with Stata's memory model) to hope to generate every construct of the data you'd ever think of using, and then start analyzing it. Most variables (incl. dummies, interactions, or more complicated constructs like leave-out means) come and go during a developing analysis, and cannot just be kept on disk (from which it is slow to merge anyway), let alone in RAM. So Python is a substitute for the initial data import from text files, but barely all that comes after that.

I would have thought StataCorp's marginal revenue would come from upgrades, not a new user picking up Stata for IRT. So selling more upgrade licenses (every cycle, or even at higher prices early on) because users cannot wait to get the latest performance improvements would sound like a reasonable business model to me.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5025
#215

07 Apr 2015, 12:10

We'll have to start a wish list for Stata 15 thread. ;-) For my own part, I am very happy about some of the enhancements to the margins command. It looks like it will be much easier to use after multiple-outcome commands like ologit. I'll be curious to see if the documentation for margins has improved -- I've always thought it needed more command-specif help, e.g. the margins help for xtlogit should not be the same as the help for logit. People are always getting confused because margins isn't giving them what they expect.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4494
#216

07 Apr 2015, 12:17

Rich apparently thinks he's joking about a wish list for Stata 15; however, I believe that the company starts planning early and anything big needs to be wished for within the next 60 days (30 days even better) or there is a good chance it won't make it; now (well next few weeks) is the time to start wishing
1 like
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 707

#217

07 Apr 2015, 12:31

The postestimation manual entries and help files have been updated to include margins
specific information. Here is a quick peek

http://www.stata.com/help.cgi?mlogit...mation#margins

Just like for predict, these manual entries now have a section for margins that details
which statistics are supported by margins. Also mentioned is the default prediction.
As you will notice for mlogit, margins now defaults to probabilities for each (all) outcomes.

Here is a quick example.

Code:

. sysuse auto
(1978 Automobile Data)

. mlogit rep turn trunk
(output omitted)

. margins

Predictive margins                              Number of obs     =         69
Model VCE    : OIM

1._predict   : Pr(rep78==1), predict(pr outcome(1))
2._predict   : Pr(rep78==2), predict(pr outcome(2))
3._predict   : Pr(rep78==3), predict(pr outcome(3))
4._predict   : Pr(rep78==4), predict(pr outcome(4))
5._predict   : Pr(rep78==5), predict(pr outcome(5))

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _predict |
          1  |   .0289855    .017518     1.65   0.098    -.0053491    .0633201
          2  |    .115942   .0358862     3.23   0.001     .0456063    .1862778
          3  |   .4347826   .0563016     7.72   0.000     .3244335    .5451318
          4  |   .2608696   .0517527     5.04   0.000     .1594361    .3623031
          5  |   .1594203   .0394829     4.04   0.000     .0820353    .2368053
------------------------------------------------------------------------------

Comment

Jonathan Horowitz

Join Date: Apr 2015

Posts: 102
#218

07 Apr 2015, 16:19

It seems that most of the wish list items were either a) suggestions to make existing commands stronger or b) workflow-type issues. I've been browsing through the manual's "What's New" section and I can't find a whole lot in either of those categories. Maybe I'm just looking in the wrong place? The exception, though, seems to be survival models (which look stronger than they've ever been before).
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#219

07 Apr 2015, 16:31

Anyone knows what the internal xtreg changes are?

Code:

xtreg, fe is now orders of magnitude faster when there are many panels, and there always are.

(From http://www.stata.com/help.cgi?whatsnew13to14)

Also, nice that there is programmatic PDF support!
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#220

07 Apr 2015, 17:23

Is -xtreg, fe- faster now than -_areg-, Sergio Correia? Unlikely, right?

Originally posted by Sergio Correia View Post

Anyone knows what the internal xtreg changes are?

Code:

xtreg, fe is now orders of magnitude faster when there are many panels, and there always are.

(From http://www.stata.com/help.cgi?whatsnew13to14)

Also, nice that there is programmatic PDF support!
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#221

07 Apr 2015, 19:20

Since both areg and xtreg_fe are built on top of _regress, that would be extremely unlikely. However, I remember past concerns about how xtreg_fe was *much* slower than areg, so the manual is probably referring to that.

What we really need (or at least I need) is
a) a faster way to manipulate data (collapse, egen, tabulate, merge and sort are simply too slow compared to e.g. plyr).
b) low-level commands that allow users to improve Stata.

I'm not sure if Statacorp can keep up with the OSS alternatives by itself. Ten years ago, ggplot, lpyr, pandas, scipy, julia, etc. were not a thing, and now they each have some really nice features that I wish I could use in Stata, but can't. There are still strong reasons for preferring e.g. Stata to R, but at some point the cons may outweight the pros.

(Also, while I'm in rant mode, is there a way to fix the forum? error messages, double posting, etc. make this quite hard to use)
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#222

08 Apr 2015, 15:09

By the way, there is some interesting discussion in the research computing (High Performance Computing) community about wasted opportunities, jealousy and Not Invented Here: http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/

I guess Stata/MP is close to MPI.
1 like
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#223

08 May 2015, 20:59

Re some earlier posts (and no new topic about Stata 15), some easily distributed data science our community could catch up to:
http://amplab-extras.github.io/SparkR-pkg/
https://spark.apache.org/sql/
Some of you might also enjoy the last two episodes on this podcast: http://www.rce-cast.com/
1 like
Comment
Alexander Wuttke

Join Date: May 2014

Posts: 44
#224

05 Aug 2015, 10:12

My biggest wishes (for Stata 15 now) concern the Do-File Editor:
Auto-Save
and some kind of navigation, clickable anchors so that it´s easier to get to the segment I am looking for
2 likes
Comment
Evan Sommer

Join Date: Jun 2015

Posts: 18
#225

31 Aug 2016, 17:07

Originally posted by Jonathan Horowitz View Post

I second this, but also would like to see this implemented for non-SEM routines too. I realize this probably is a much bigger challenge than implementing it for SEM, but fiml is often the best way to handle missing data (ee: http://www.statisticalhorizons.com/w...ngDataByML.pdf) and it would be great to see it become standard.

...

Over a year later and I third this! Especially for xtmixed.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment