Wish list for Stata 14

Sergio Correia replied

28 Mar 2015, 11:02
Originally posted by Michael Stepner View Post

If it became the default xtile program in Stata 14, that would speed up a whole host of programs that call xtile.

+1 on that.

There is in general a huge amount of speedup potential in many common functions. A quick glance at https://github.com/matthieugomez/benchmark-stata-r shows some of the main culrpits, including reshape, merge, and most of egen.

Speaking of -merge-, could we have a -sortpreserve- option? Most of the time I do merge, it changes the sort order of the data, which I then have to undo afterwards. Currently, I'm just prefixing merge with a simple ado that does that, but I feel it should be an option as it saves a lot of time on large datasets (and one line of code).
Leave a comment:
Michael Stepner replied

28 Mar 2015, 07:42
Repeating a minor wish list item that I've mentioned in person:

I believe that my SSC program fastxtile produces identical output to the built-in program xtile, but runs much faster. If it became the default xtile program in Stata 14, that would speed up a whole host of programs that call xtile.
1 like
Leave a comment:
Clyde Schechter replied

27 Mar 2015, 21:24
There is another radical solution: make the stable option - default. Make another option (please don't call it fast, call it randomties or something like that to illustrate the point). Some programs will slow down, but there is no risk of misunderstanding.

Well, I disagree with making the stable option the default. I agree with Stata Corp. that it is a good thing that sorting ties are broken randomly, resulting in indeterminacy of later calculations that are sensitive to the sort order. Anyone who is applying such calculations after an under-defined sort is generating indeterminate results--there are very, very few circumstances where this is not an error.. The -stable- option papers over the problem. With the current default you will at least realize what you have done and you can then either fully specify a sort key that uniquely identifies the observations, or switch your calculations to procedures that are insensitive to sort order (depending on which was the source of the error). If -stable- is made the default, most of these errors will go undetected for a very long time, and people may have already relied on the spurious results when it is discovered.

What I would endorse, along Sergiy's line of reasoning, is to make it like, for example, -destring- which requires the specification of either the -generate()- or -replace- option. One could require that when -sort- is used, one must specify either -stable- or -randomties- as an option. At least the user is forced for a moment to think about the issue this way. I might prefer a different word than -stable-, which sounds desirable. Maybe -deceptivelystable-, or -sweepitundertherug-

Of course, I have no idea whether this can be implemented in a way that does not break large amounts of legacy code. I imagine that -sort- is one of the most frequently used commands in ado files.
1 like
Leave a comment:
Sergiy Radyakin replied

27 Mar 2015, 20:32
Originally posted by daniel klein View Post

Clyde certainly has a point, but I fear this behavior would require lots of quietly statements in (already written) ado-files, where you do not want such messages to appear, especially if the sort is not directly visible for the user...

There is another radical solution: make the stable option - default. Make another option (please don't call it fast, call it randomties or something like that to illustrate the point). Some programs will slow down, but there is no risk of misunderstanding.

Similar of a trap is the default float type. Every user converting time from string to a formatted number is writing gen time=..., without writing the type double., which as we know results in loss of precision and complaints of sort "Stata has lost my data". A few of other programs handling data either don't bother about types whatsoever, or provide a wide-enough default so that the user doesn't bother: SPSS, Excel, Limdep, NLogit, etc.

Don't take me wrong, I love Stata's storage types. And each one of them is dear to me. But the double does look like a safer option than float to be selected as default.

Best, Sergiy
1 like
Leave a comment:
Lucas Mation replied

27 Mar 2015, 08:37
Improvements to the do-file editor (after using RStudio, Stata's text editor becomes a pain...):
- Autocompletion of closing parenthesis and quotation marks (even if as an opt-in option, not default)
- Make syntax highlighting of macros ( `a' $a) work inside quotation marks
I know I can use an editor of my choice, but these should come out of the box

Last edited by Lucas Mation; 27 Mar 2015, 08:48.
1 like
Leave a comment:
Richard Williams replied

26 Mar 2015, 11:43
Sort's unstable sorting is wildly counter-intuitive but I have become convinced it is right. Explaining that in a simple warning message may be very difficult though.
Leave a comment:
daniel klein replied

26 Mar 2015, 11:21
Clyde certainly has a point, but I fear this behavior would require lots of quietly statements in (already written) ado-files, where you do not want such messages to appear, especially if the sort is not directly visible for the user. I would suggest making this point more salient in the help files, but on the other hand almost half the entry already explains the stable option's purpose with illustrating examples.

Best
Daniel
Leave a comment:
Clyde Schechter replied

26 Mar 2015, 11:09
Given the frequency with which we get posts on Statalist from people who have gotten irreproducible results because of a -sort- on a list of variables that do not uniquely identify the observations, it might make sense for the -sort- command to issue a warning like "The variable(s) in the sort key do not uniquely identify the observations; the resulting sorted order is not reproducible."
1 like
Leave a comment:
Ronán Conroy replied

16 Mar 2015, 06:32
Something that my students have found a little confusing is that the -over- option can be concealed under names like "categories" in the dialogues. Making sure all dialogues are consistent with Stata syntax and with each other would be helpful.

I understand that there are plans afoot to revise the epidemiology commands, and I applaud this. The dialogues for some of these commands are bewildering, notably -tabodds- and -mcc-.

And please, Statacorp, why is it necessary for the -tabulate- dialogue to refer to "within-column relative frequencies"? A relative frequency scaled 0-100 is a percentage. They are column percents, which is not only much easier for my poor students but also more precise.
1 like
Leave a comment:
Clyde Schechter replied

12 Mar 2015, 11:36
I have a request for the do-file editor. I wish that in its open and save functions it acted more like a part of Stata and less like an independent program. If I launch Stata by double-clicking on a data set, as I often do, Stata opens with the working directory set to the directory where that data set is located. Great! Now if I open the do-editor from within Stata, and then try to open a do-file, or if I start a new do-file and try to save it, the do-file editor doesn't seem to know what Stata's working directory is: it just remembers the directory it was last used in. So I have to then navigate to the directory I want. Maybe that's functional for some people--but for my workflow where data sets and the do-files that created and analyzed them are almost always in the same directory, it's a nuisance. Actually, it's more than a nuisance because sometimes I don't quite notice that the do-file editor is in the "wrong" directory and end up saving my do file there. Then, later on, I can't find it in the directory where I thought it would be and have to go searching around for it.
4 likes
Leave a comment:
Charlie Joyez replied

12 Mar 2015, 04:48
Since I've no answers on the impossibility to compute odds-ratio after a nested logit (see my post here)
I'd be grateful if Stata 14 could incorporate a ``or'' option after nested logits, in order for us to interpret properly interaction terms in explicative variables.

Thanks
Charlie
Leave a comment:
leetaey replied

11 Mar 2015, 23:47
1. Network analysis
2. Machine learning
3. Graph command export
1 like
Leave a comment:
Sergiy Radyakin replied

10 Mar 2015, 19:13
I'd second Matthew White's request regarding .stpr files but for a different reason: they are source files and are committed to source repositories, and as such must be versioned. Binary files are not versioned well, as we know. Having something in a text format similar to Visual Studio's project files would be better.
Thank you, Sergiy.
Leave a comment:
Jeph Herrin replied

10 Mar 2015, 13:43
This is a big wish, but as long as we're wishing...

I've been using MCMC estimation more and more often, and (as far as I can tell) Stata is largely limited to making calls to WinBUGs. I've been using Stan (or RStan, via R) and SAS' PROC MCMC, both of which are very powerful and generic, and each time I use either I wonder when Stata will have something similar.
Leave a comment:
Carlos M. Urzúa replied

25 Feb 2015, 12:54
It would be nice to have in Stata 14 at least one pseudo-random uniform number generator that has a very long period and a high order of equidistribution. The Mersenne Twister (due to Matsumoto and Nishimura) would be my first choice..
1 like
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: