Wish list for Stata 14

Richard Williams

Join Date: Apr 2014

Posts: 5043
#31

01 Aug 2014, 05:36

Following up on Daniel's comment, I'd like it if the documentation did more to at least warn you about things that may be bad ideas. For example, with MI, various sources say passive imputation is a bad idea. mi impute allows pweights but again, some say you shouldn't use them. Users have to know what their doing, but a little more guidance on whether something is a good or bad idea (perhaps even just one or two sentences with references to learn more) could help.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#32

01 Aug 2014, 05:43

This may be more of a request for a FAQ than for changes in Stata 14. Stata has mi, svy, and xt. I get confused over how and how much I can mix and match these things. Rather than scanning through different manuals, it would be nice to have a single FAQ that showed, say, how I can use (or can't use) mi with xt, what mi commands can and cannot be combined with svy, using survey weights with xt data, etc. Just saying that it can't be done may actually be especially helpful, since otherwise you may go scouring the manuals and the web to figure out how it could be done.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#33

01 Aug 2014, 06:09

A big thing, a substantial rewrite worthy a major new version: column-based storage, or even more fundamentally different data structures. Curiously, this seems to me the bottleneck of Stata with N > 10^7.

A small thing: kill "save, replace." It is a disaster if it is invoked by unset local, say, "save `neverset,maybetypo', replace"

Or maybe addressing other gotchas: http://www.ifs.org.uk/docs/stata_gotchasJan2014.pdf

SVG graphics for modern, HTML5-ready browsers.

An official (and as-fast-as-it-gets) implementation of -rdrobust- and -binscatter-. (Honorable mentions: -psacalc-, -estrat-, or maybe some machine learning tools, at least lasso, and a powerful cross-validation wrapper.)

And of course, the old thread: http://www.stata.com/statalist/archi.../msg00057.html
Comment
László Sándor

Join Date: Apr 2014

Posts: 120
#34

01 Aug 2014, 06:41

Something wilder: support for git or even GitHub. Though this does not need to be more than a promotion of the site and service. But as many social scientist do their first or only programming in Stata, Stata should think about helping them develop best practices hardcode coders would adopt anyway.

Being able to mark up do-files with links to GitHub issues or pull requests, or Asana/Google tasks would be another amazing thing, though very unlikely.
1 like
Comment
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#35

02 Aug 2014, 09:31

I want to add my two pence:
Less monstrous syntax editor with nice pre-define syntax colouring schemes and ability to upload new colouring schemes as some sensible XML or text (suggested extension *.stheme) files where colour palettes could be easily modified by typing numerical values

Support for open data, the more is done with respect to this the better as this is where the data is going, especially in governance

spmap (SSC) should be introduced as part of the base package and treated with deserved respect

Proper antialasing, improved export for vector graphics
Ideally, I would like see something on the lines of R graphics device where the one could easily define picture size and resolution

Nice code completion, especially I would to get RStudio equivalent a list of options after typing coma for each command. So for instance hitting Tab after typing graph box, would open a list of available options

Generic option to get variable labels instead of variable names whenever desired. It shouldn't be so much hassle with this. If people have desire to get opulent tables with long labels squeezed into table cells they should be able to do it.

Interactive plotting, but I think that something on those lines appeared in programmers community (or maybe I'm wrong)

If I remember well, old versions of Stata used to have something like tutorials that the one could run inside the Stata. I think this idea got second life with the advent of RMarkdown and fashion for "reproducible research". It's a nice thing and definitely worth considering.

Kind regards,
Konrad
Version: Stata/IC 13.1
1 like
Comment
Konrad Zdeb

Join Date: Apr 2014

Posts: 496
#36

02 Aug 2014, 09:36

A few more things:

I would like to be able to read external data files and store them as objects without saving:

Code:

object1 <- insheet using some_file.extension object2 <- insheet using some_file2.extension masterdata: supermerge object1 object2, by(id) etc describe masterdata

Code auto completion should give me all options after first few letter so typing gr+Tab would show list starting from most popular command (graph ...)

Originally posted by László View Post

Something wilder: support for git or even GitHub. Though this does not need to be more than a promotion of the site and service. But as many social scientist do their first or only programming in Stata, Stata should think about helping them develop best practices hardcode coders would adopt anyway.

Being able to mark up do-files with links to GitHub issues or pull requests, or Asana/Google tasks would be another amazing thing, though very unlikely.

I agree with this RStudio provides this functionality.

Last edited by Konrad Zdeb; 02 Aug 2014, 09:43. Reason: Quote.

Kind regards,
Konrad
Version: Stata/IC 13.1
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30360
#37

02 Aug 2014, 12:43

I endorse Laszlo's wish to eliminate -save, replace-. In my experience, local macro references are the most at-risk for typographical errors of any part of Stata syntax, because to reach the left-quote key, you have to take your fingers off of the home keys. They may return to the wrong place, and then you are likely to mistype the local macro name. If that typo does not constitute a defined macro, as it likely won't, you then clobber a data set inadvertently.

Or perhaps the solution is to find some other notation for dereferencing macros that doesn't require going to the far reaches of the keyboard. If global macros can be dereferenced with $, why can't local macros be de-referenced with @, or something like that?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30360
#38

02 Aug 2014, 13:06

Something just dawned on me. I have, in previous versions' wish lists, requested that dereferencing a undefined macro be made illegal, (or at least a setting to do that) rather than returning an empty string. A compelling objection to that is that it would break enormous amounts of existing code that rely on an undefined macro's evaluating to empty.

It might be possible to keep the existing `' system, and also introduce @-dereferencing, as suggested in my earlier post, with @-dereferencing of an undefined macro being illegal. That way those of us who would prefer safer macros, could use the @ method, and those who prefer having undefined macros usable as empty strings could stick with `'. And the change wouldn't break a single program. Also, this method of dereferencing would reduce, or eliminate all those unreadable (to the human eye) sequences of `"``...''"' that currently populate our programs.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3914
#39

02 Aug 2014, 14:25

It might be possible to keep the existing `' system, and also introduce @-dereferencing, as suggested in my earlier post, with @-dereferencing of an undefined macro being illegal. That way [...] the change wouldn't break a single program.

I know this is a wishlist rather than a discussion, but might I point out that this particular example would break lots of code, including official Stata's reshape and the entire sem suit, in which @ is a legal character with special meaning. I imagine it would be very hard, if not impossible, to find a single character for dereferencing that would not break at least some old code. After all, this character must not be used anywhere in a (a)do-file if it is to issue an error message if whatever follows is an undefined local macro. It might be possible to use other characters surrounding the local macro name other than single quotes. Another alternative would be a macval2(lmacname) that would issue an error message if lmacname is undefined.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30360
#40

02 Aug 2014, 19:10

Good point, Dan. I forgot about the use of the @ character in -reshape-, and in-sem-, even though I use both of those regularly! I guess that wouldn't work. macval2(lmacname) would be reasonable, or maybe something a little shorter such as mval(). Or it might be possible to use something like @@ for local macro dereferencing, just as = and == are different. The gist of it, to me, is to provide an alternative way to dereference local macros, easy to type, and have it reject undefined macros.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#41

02 Aug 2014, 19:14

If I could restart the world from scratch and have everything exactly the way I wanted it, we would not have `' . Or at least, we definitely would not have things like `"`'"' or whatever it is. Those things drive me crazy and if they get at all complicated I invariably have to try 5 times to get it right. Unfortunately I am not sure what the world would have instead.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5043
#42

04 Aug 2014, 08:29

I would like to see much better support for Full Information Maximum Likelihood (fiml). Some Stata routines, e.g. SEM, provide some support for fiml (which Stata calls mlmv). But, there are several limitations to Stata fiml support as it now stands.

* As Clyde Schechter points out in this thread, http://www.statalist.org/forums/foru...y-imputed-data, "Stata has -method(mlmv)- which is full information but relies on multivariate normality. MPlus has a full information estimator which is also robust to non-normality."

* As far as I can tell, fiml only works with linear models, e.g. you can't use it for logit.

* fiml could be useful in many more commands, e.g. it would be nice if regress had a fiml option (although maybe that would complicate postestimation commands)?

* Also, I understand that some other programs (e.g. MPLUS) let you specify auxiliary variables that help improve the handling of missing data.

I've read in several places that fiml is as good, if not better, than multiple imputation for handling missing data. It is certainly easier to add fiml as an option than it is to impute a bunch of data sets. See, for example,

http://www.statisticalhorizons.com/ml-better-than-mi

Stata has come a long way in recent years with SEM and GSEM. Support for fiml, however, seems to be one area where it lags behind some of the competition.

Last edited by Richard Williams; 04 Aug 2014, 08:31.

-------------------------------------------
Richard Williams
Professor Emeritus of Sociology
University of Notre Dame
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
David Poensgen

Join Date: Jun 2014

Posts: 27
#43

07 Aug 2014, 07:01

What I would personally would love to have is a replace option for generate. It's a small thing in the grand scheme of things, but the lack of one keeps annoying me - especially when doing ad-hoc trials with .do-files and the like. Also, looping could be simpler in some cases.

Somewhat similaris the possibility to save empty data sets - yes, I know there's an addon to do it and also easy ways to work around, but it would make some things much more elegant to be able to do so from scratch.

Last but not least I would love some possibility to temporarily or permanently change the accuracy with which relational operators are evaluated. I recently did learn how to use the -float- function and all, but it seems somewhat tedious, and I keep forgetting and then wasting time finding my mistake... I would imagine there are countless cases where the current behaviour is both unwanted and unexpected by the user.

Last edited by David Poensgen; 07 Aug 2014, 07:08.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3914
#44

07 Aug 2014, 07:30

What I would personally would love to have is a replace option for generate. It's a small thing in the grand scheme of things, but the lack of one keeps annoying me - especially when doing ad-hoc trials with .do-files and the like. Also, looping could be simpler in some cases.

In the meantime check out regen and from SSC and/or cmpute from SJ.

I would like to add an ascii() function (like the Mata equivalent). More general would be the possibility to write own Stata functions - althogh I think this has been ruled out for reason I do not remember.

Best
Daniel
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36074
#45

07 Aug 2014, 07:53

Last but not least I would love some possibility to temporarily or permanently change the accuracy with which relational operators are evaluated. I recently did learn how to use the -float- function and all, but it seems somewhat tedious, and I keep forgetting and then wasting time finding my mistake... I would imagine there are countless cases where the current behaviour is both unwanted and unexpected by the user.

You are presumably referring to ==, >, <, >=, <=, !=.

The problem you identify isn't (to me) at all clear. Being unexpected, unfortunately, means more often that the user doesn't understand Stata yet (and that happens all the time to very experienced users too).

If you want to take control and allow some fuzziness in comparisons, I would start with c(epsfloat) and c(epsdouble) as accessible constants. It's not clear precisely what you expect StataCorp to implement that isn't already under user control.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment