Wish list for Stata 14

Matthew White

Join Date: Apr 2014

Posts: 29
#181

18 Feb 2015, 20:43

I'm a big fan of the Stata 13 Project Manager. It'd be helpful to be able to read/write .stpr files programmatically. There are a few project operations I'd like to automate that aren't currently supported by the Project Manager, for instance, removing all nonexistent files or dynamically linking a directory so that files added to the directory are added to the project. Perhaps it makes sense for StataCorp to implement only a subset of them, but if .stpr files were programmatically readable, I'd be able to implement all of them myself. .stpr files are binary, so perhaps an export to/import from XML or JSON feature would work well. Or perhaps .stpr files could even be exported to or imported from appropriately formatted Stata datasets so that they can be easily manipulated from within Stata.

Last edited by Matthew White; 18 Feb 2015, 20:45. Reason: Typo fix
1 like
Comment
Julian Pritsch

Join Date: Apr 2014

Posts: 80
#182

23 Feb 2015, 06:08

In my opinion, a wishful option for the commands of the [ME]-family is a visual display of the model/equation one would like to estimating. I am thinking something in line with the visual presentation of the model/equation in the HLM. For an example see UCLA-website, HLM snapshot for Model 4: http://www.ats.ucla.edu/stat/hlm/seminars/hlm_mlm/608/mlm_hlm_seminar_v608.htm

I think this is very helpful because one is able to actual see the equation which is often shown (with varying notations) in multilevel textbooks.
Comment
Carlos M. Urzúa

Join Date: Feb 2015

Posts: 1
#183

25 Feb 2015, 12:54

It would be nice to have in Stata 14 at least one pseudo-random uniform number generator that has a very long period and a high order of equidistribution. The Mersenne Twister (due to Matsumoto and Nishimura) would be my first choice..
1 like
Comment
Jeph Herrin

Join Date: Apr 2014

Posts: 335
#184

10 Mar 2015, 13:43

This is a big wish, but as long as we're wishing...

I've been using MCMC estimation more and more often, and (as far as I can tell) Stata is largely limited to making calls to WinBUGs. I've been using Stan (or RStan, via R) and SAS' PROC MCMC, both of which are very powerful and generic, and each time I use either I wonder when Stata will have something similar.
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#185

10 Mar 2015, 19:13

I'd second Matthew White's request regarding .stpr files but for a different reason: they are source files and are committed to source repositories, and as such must be versioned. Binary files are not versioned well, as we know. Having something in a text format similar to Visual Studio's project files would be better.
Thank you, Sergiy.
Comment
leetaey

Join Date: Apr 2014

Posts: 1
#186

11 Mar 2015, 23:47

1. Network analysis
2. Machine learning
3. Graph command export
1 like
Comment
Charlie Joyez

Join Date: Dec 2014

Posts: 421
#187

12 Mar 2015, 04:48

Since I've no answers on the impossibility to compute odds-ratio after a nested logit (see my post here)
I'd be grateful if Stata 14 could incorporate a ``or'' option after nested logits, in order for us to interpret properly interaction terms in explicative variables.

Thanks
Charlie
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30151
#188

12 Mar 2015, 11:36

I have a request for the do-file editor. I wish that in its open and save functions it acted more like a part of Stata and less like an independent program. If I launch Stata by double-clicking on a data set, as I often do, Stata opens with the working directory set to the directory where that data set is located. Great! Now if I open the do-editor from within Stata, and then try to open a do-file, or if I start a new do-file and try to save it, the do-file editor doesn't seem to know what Stata's working directory is: it just remembers the directory it was last used in. So I have to then navigate to the directory I want. Maybe that's functional for some people--but for my workflow where data sets and the do-files that created and analyzed them are almost always in the same directory, it's a nuisance. Actually, it's more than a nuisance because sometimes I don't quite notice that the do-file editor is in the "wrong" directory and end up saving my do file there. Then, later on, I can't find it in the directory where I thought it would be and have to go searching around for it.
4 likes
Comment
Ronán Conroy

Join Date: Apr 2014

Posts: 14
#189

16 Mar 2015, 06:32

Something that my students have found a little confusing is that the -over- option can be concealed under names like "categories" in the dialogues. Making sure all dialogues are consistent with Stata syntax and with each other would be helpful.

I understand that there are plans afoot to revise the epidemiology commands, and I applaud this. The dialogues for some of these commands are bewildering, notably -tabodds- and -mcc-.

And please, Statacorp, why is it necessary for the -tabulate- dialogue to refer to "within-column relative frequencies"? A relative frequency scaled 0-100 is a percentage. They are column percents, which is not only much easier for my poor students but also more precise.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30151
#190

26 Mar 2015, 11:09

Given the frequency with which we get posts on Statalist from people who have gotten irreproducible results because of a -sort- on a list of variables that do not uniquely identify the observations, it might make sense for the -sort- command to issue a warning like "The variable(s) in the sort key do not uniquely identify the observations; the resulting sorted order is not reproducible."
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3874
#191

26 Mar 2015, 11:21

Clyde certainly has a point, but I fear this behavior would require lots of quietly statements in (already written) ado-files, where you do not want such messages to appear, especially if the sort is not directly visible for the user. I would suggest making this point more salient in the help files, but on the other hand almost half the entry already explains the stable option's purpose with illustrating examples.

Best
Daniel
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5012
#192

26 Mar 2015, 11:43

Sort's unstable sorting is wildly counter-intuitive but I have become convinced it is right. Explaining that in a simple warning message may be very difficult though.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Lucas Mation

Join Date: Mar 2014

Posts: 39
#193

27 Mar 2015, 08:37

Improvements to the do-file editor (after using RStudio, Stata's text editor becomes a pain...):
- Autocompletion of closing parenthesis and quotation marks (even if as an opt-in option, not default)
- Make syntax highlighting of macros ( `a' $a) work inside quotation marks
I know I can use an editor of my choice, but these should come out of the box

Last edited by Lucas Mation; 27 Mar 2015, 08:48.
1 like
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#194

27 Mar 2015, 20:32

Originally posted by daniel klein View Post

Clyde certainly has a point, but I fear this behavior would require lots of quietly statements in (already written) ado-files, where you do not want such messages to appear, especially if the sort is not directly visible for the user...

There is another radical solution: make the stable option - default. Make another option (please don't call it fast, call it randomties or something like that to illustrate the point). Some programs will slow down, but there is no risk of misunderstanding.

Similar of a trap is the default float type. Every user converting time from string to a formatted number is writing gen time=..., without writing the type double., which as we know results in loss of precision and complaints of sort "Stata has lost my data". A few of other programs handling data either don't bother about types whatsoever, or provide a wide-enough default so that the user doesn't bother: SPSS, Excel, Limdep, NLogit, etc.

Don't take me wrong, I love Stata's storage types. And each one of them is dear to me. But the double does look like a safer option than float to be selected as default.

Best, Sergiy
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30151
#195

27 Mar 2015, 21:24

There is another radical solution: make the stable option - default. Make another option (please don't call it fast, call it randomties or something like that to illustrate the point). Some programs will slow down, but there is no risk of misunderstanding.

Well, I disagree with making the stable option the default. I agree with Stata Corp. that it is a good thing that sorting ties are broken randomly, resulting in indeterminacy of later calculations that are sensitive to the sort order. Anyone who is applying such calculations after an under-defined sort is generating indeterminate results--there are very, very few circumstances where this is not an error.. The -stable- option papers over the problem. With the current default you will at least realize what you have done and you can then either fully specify a sort key that uniquely identifies the observations, or switch your calculations to procedures that are insensitive to sort order (depending on which was the source of the error). If -stable- is made the default, most of these errors will go undetected for a very long time, and people may have already relied on the spurious results when it is discovered.

What I would endorse, along Sergiy's line of reasoning, is to make it like, for example, -destring- which requires the specification of either the -generate()- or -replace- option. One could require that when -sort- is used, one must specify either -stable- or -randomties- as an option. At least the user is forced for a moment to think about the issue this way. I might prefer a different word than -stable-, which sounds desirable. Maybe -deceptivelystable-, or -sweepitundertherug-

Of course, I have no idea whether this can be implemented in a way that does not break large amounts of legacy code. I imagine that -sort- is one of the most frequently used commands in ado files.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment