Wishlist for Stata 15

Daniel Feenberg

Join Date: Oct 2014

Posts: 328
#16

23 Oct 2015, 09:57

More wishes for Stata 15:

0) Allow -if- and -in- qualifiers to the -append- statement.
1) Add support for varlist and -if- and -in- qualifiers to the -save- statement.
2) Allow user to provide a memory limit for the -merge- statement, so that
-merge- does not have to assume every possible observation will merge,
which may reserve far more memory than is necessary..
3) Speed up -merge- to be similar in speed to -use-.
4) Allow decreasing alphabetic sorts in -sort-.
5) Reading and writing .raw and .dta file from and to a pipe. (This was available
through Stata 12).
6) A traceback (treating .do and .ado files like subroutines) for fatal errors showing
the sequence of calls to the aborting routine (with line numbers).
7) Augment "NNN missing values generated" with the variable name so
that you can readily identify which of a series of assignments had a
problem.

The rationale for these is given at http://www.nber.org/stata/efficient
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5010
#17

23 Oct 2015, 16:04

4) Allow decreasing alphabetic sorts in -sort-.

Check out -gsort-

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Adrian Esterman

Join Date: May 2014

Posts: 3
#18

23 Oct 2015, 17:21

I would love to see a package for predictive modelling similar to Frank Harrell's rms R package
Comment
Megan Dyfvermark

Join Date: Oct 2015

Posts: 2
#19

23 Oct 2015, 21:19

Automatic notifications after pieces of a .do file are done and/or computation done in current window (like the user-created "beep" command), built into Stata preferences under a menu item like alerts and sounds.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#20

24 Oct 2015, 12:51

It would be nice if the behavior of the Do button in Stata's do-file editor was changed to execute the selected text using the include command instead of the do command. Currently, the selected text is copied to a temporary file and the commands are executed using something like:

Code:

do "/var/folders/cp/z8cssshn6935x9p181c71_7m0000gn/T//SD00570.000000"

The problem with this approach is that local macros do not survive once the temporary do-file terminates and never make it to the interactive session. By switching to

Code:

include "/var/folders/cp/z8cssshn6935x9p181c71_7m0000gn/T//SD00570.000000"

all local macros would be defined in the interactive session. I don't see any downside since people who work this way think that they are running parts of a do-file interactively.
4 likes
Comment
Erika Kociolek

Join Date: Apr 2014

Posts: 83
#21

17 Nov 2015, 17:11

It would be nice to have:
A variable format for percents similar to what is available for commas.

A more straightforward way to add information to graphs that isn't necessarily what is being shown in the graph itself (see this link).

Graphics that are cleaner and a bit more modern-looking.

Outputs that can be easily dropped into Word or other programs without too much formatting.

I agree with Nils Enevoldsen (#14, above) that a feature request forum would be so helpful. There are a lot of good ideas in a lot of different spots, including here, on another Stata 15 thread, and even on the Stata 14 wishlist.

Another suggestion is a feature request forum for the Statalist forum itself . . .
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#22

17 Nov 2015, 17:23

Graphics that are cleaner and a bit more modern-looking

Please expand. What's superfluous or old-fashioned precisely?
Comment
Erika Kociolek

Join Date: Apr 2014

Posts: 83
#23

23 Nov 2015, 21:12

In response to #22 above, nice graphs (such as this one and this one) are possible to create using Stata, but they take work. It would be nice to have simpler, cleaner graphs (no blue background and toned down default colors for graph elements) as the default or as schemes maintained by Stata. The s1color and s1mono schemes are a good start, but could be improved. In particular, the colors in s1color could be toned down so the graphs created using that scheme work well with other text, graphs, and tables and don't stand out unnecessarily.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#24

24 Nov 2015, 01:54

Erika: Thanks for expanding on your previous post.

I dislike the blue background too. Every graph I post here is based on the s1color scheme and that is set as the default for my sessions in my own profile.do (and even on our own university system). That and indeed most other graph schemes switch off the blue background. Such schemes have been available since Stata 8 and it's easy to set up your own default colours if you want something more subdued. So, while I agree broadly with your preferences (I am not clear why the blue and red you choose for one graph qualify as subdued) I can't see that there is any to-do list here for StataCorp for Stata 15.

I've commended use of grey in various places, e.g. in posts here urging minimal use of thin light grey grid lines and in

http://www.stata-journal.com/sjpdf.h...iclenum=gr0040

http://www.stata-journal.com/sjpdf.h...iclenum=gr0023

See also Billy Buchanan's brewscheme (SSC etc.). He posts here as wbuchanan and the package is mentioned in several threads here.

Last edited by Nick Cox; 24 Nov 2015, 02:23.
Comment
Michael Anbar

Join Date: Aug 2014

Posts: 116
#25

09 Dec 2015, 13:29

It would be helpful to have

1. Changing the code to the -tabstat- function to allow the full variable name to be displayed. This bug (and it is a bug because the limit of 16 characters is arbitrary) is documented here: http://www.statalist.org/forums/foru...in-tabstat-ado This should be trivial to change, since changing it wouldn't affect the default and therefore wouldn't break existing code.

2. Informative error messages. For example, some error messages related to writing to Excel are grossly unhelpful, e.g. http://www.statalist.org/forums/foru...-uninformative. As has been discussed on the forum and mailing lists before, uninformative error messages is a general problem with Stata, which might be a symptom of error categories that are too general and thus coupled with messages that are too general to be helpful.

3. The feature to generate cross-moment matrices and then to use those matrices with the -regress- command, without having to manually calculate betas, standard errors, etc. The benefits to this are documented here: http://www.statalist.org/forums/forum/general-stata-discussion/general/1318922-can-i-generate-a-cross-moment-matrix-and-then-use-it-in-repeated-regressions. This is a highly useful feature that, in my experience, other competing programs like RATS have had for over a decade. (I realize that RATS is slightly more specialized, but it's usually decades ahead of many other programs in implementing canned routines for econometrics)

4. Debugging, especially in Mata, but this would also be extremely helpful in Stata. I and others have raised this on the list for years, but compare the experience setting breakpoints, stepping through code line by line, etc. in MATLAB, Visual Studio, and other IDE's with -set trace on- and -pause- in Stata demonstrates why this is so useful. Stata's primitive debugging abilities are years out of date and pale in comparison to modern languages and their IDE's.

5. Code completion in the default do-editor. Especially for functions, variable names, etc.

6. As was raised in this post on the previous wishlist (http://www.statalist.org/forums/foru...015#post150015 ), a vertical select in the do-file editor would be extremely handy.

7. Quoting from a post on the Stata 14 wishlist (http://www.statalist.org/forums/foru...=2138#post2138 ): "Some version of the -case- (also called -switch- or -select-) command that is available in other programming languages. I usually accomplish the same thing with either a series of -if- statements or set of nested -cond()- calls; neither is very easy to read."

8. Again, as discussed in the previous wishlist, sparse matrices in Mata! http://www.statalist.org/forums/foru...340#post256340

9. I agree with the previous posts about a feature request forum, e.g. uservoice. Something like this would make feature requests much easier to submit and much more transparent.

I realize that many of these features aren't the type that sell more copies of Stata by appealing to new users (like Bayesian estimation and IRT in the most recent version) and therefore may be a hard sell from a sales point of view, but for existing users of Stata they would be extremely helpful. My institution has several hundred heavy Stata users, and I've tried to address the limitations that we come across when using it.

Last edited by Michael Anbar; 09 Dec 2015, 13:46.
3 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#26

09 Dec 2015, 13:54

Michael Anbar Regarding #3: can't you accomplish this by using -sem- with its summary statistics data options?
Comment
Michael Anbar

Join Date: Aug 2014

Posts: 116
#27

09 Dec 2015, 14:11

Originally posted by Clyde Schechter View Post

Michael Anbar Regarding #3: can't you accomplish this by using -sem- with its summary statistics data options?

I've never used the -sem- command, but I'll look into those and see if that's what I'm looking for (and post any updates in the other threads, both here and on StackOverflow). I'm not sure these are 100% equivalent because
a) -sem- doesn't support factor-variable notation (according to the linear regression example in [sem] intro 6, Structural models 1). I can bypass this by using -xi-, but as the documentation states, factor variables are the recommended method (unless, of course, the command doesn't support them). Since -gsem- supports them, though, maybe that's where I should look.
b) since the standard errors are calculated a little differently, the results won't be 100% the same between the -regress- and -sem- commands.

Also, does -sem- have similar optimizations to -regress- internally? For example, -regress- is usually very fast, but commands like -areg- are considerably slower because they're mostly implemented in Stata's ado language, and this is a critical issue on midsize datasets (e.g. 10-20 GB). If -sem- uses a slower implementation, that's a considerable drawback.

Last edited by Michael Anbar; 09 Dec 2015, 14:19.
Comment
Michael Anbar

Join Date: Aug 2014

Posts: 116
#28

09 Dec 2015, 15:52

I'll add something simpler: an option to insert a certain number of spaces into the do-file editor instead of a tab character. For example, many text editors allow me to set the Tab key to insert four spaces in place of the actual tab character. This is useful for formatting output in the display window and across multiple systems (since different systems display tab characters differently).

Last edited by Michael Anbar; 09 Dec 2015, 15:54.
Comment
Michael Anbar

Join Date: Aug 2014

Posts: 116
#29

17 Dec 2015, 08:59

Originally posted by [email protected] View Post

More wishes for Stata 15:

0) Allow -if- and -in- qualifiers to the -append- statement.
1) Add support for varlist and -if- and -in- qualifiers to the -save- statement.
2) Allow user to provide a memory limit for the -merge- statement, so that
-merge- does not have to assume every possible observation will merge,
which may reserve far more memory than is necessary..
3) Speed up -merge- to be similar in speed to -use-.
4) Allow decreasing alphabetic sorts in -sort-.
5) Reading and writing .raw and .dta file from and to a pipe. (This was available
through Stata 12).
6) A traceback (treating .do and .ado files like subroutines) for fatal errors showing
the sequence of calls to the aborting routine (with line numbers).
7) Augment "NNN missing values generated" with the variable name so
that you can readily identify which of a series of assignments had a
problem.

The rationale for these is given at http://www.nber.org/stata/efficient

I'd add efficient reshapes to this list (which comes directly from the NBER link). The link sums it up best:

The reshape command is inexplicably slow. Take a million observation dataset with variables id, year and x2001-x2010. Then the command to reshape wide to long format:

Code:

xtset id year reshape long x, i(id) j(year)

take about 20 seconds per million observations. But you can write out a separate file for each year of data, and then concatenate them into one long dataset in about 2 seconds. For example:

Code:

forvalues year = 2001/2010 { use id year x`year' using "/tmp/reshape",replace rename x`year' x save "/tmp/reshape`year'",replace } clear forvalues year = 2001/2010 { append using "/tmp/reshape`year'" }

Long back to wide might be more difficult, since it would require a merge command, and the Stata merge command is quite slow compared to -use-.

The completion times quoted in that post may have changed as machines have become faster, but the point still stands. Especially when compared to languages like R and Python, Stata's -reshape- command is painfully slow. As I've said before, this might not be something that's as marketable to new users as large classes of new features, but it would certainly be helpful to existing users who (attempt to) use Stata on anything more than small datasets.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#30

17 Dec 2015, 10:04

1. User defined functions - Sometimes it'd be nice to be able to write a function that can be evaluated inline by existing programs (e.g., so a user could log transform a variable in a regression without having to explicitly create the variable first).
2. Make more of the data management/native commands r-class - There is often great interactive functionality with some of the commands, but having them return values to macros/matrices provides a lot more flexibility (e.g., if -ls- returned the file system properties in a matrix instead of displaying on screen only)
3. Better documentation for the graphics system - this can help user-programmers to expand existing capabilities by understanding a bit more about the undocumented functions and/or migrating to a web-based format
4. Native JDBC support similar to the existing ODBC support.
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment