Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sergio Correia
    replied
    Originally posted by Michael Stepner View Post
    If it became the default xtile program in Stata 14, that would speed up a whole host of programs that call xtile.
    +1 on that.

    There is in general a huge amount of speedup potential in many common functions. A quick glance at https://github.com/matthieugomez/benchmark-stata-r shows some of the main culrpits, including reshape, merge, and most of egen.

    Speaking of -merge-, could we have a -sortpreserve- option? Most of the time I do merge, it changes the sort order of the data, which I then have to undo afterwards. Currently, I'm just prefixing merge with a simple ado that does that, but I feel it should be an option as it saves a lot of time on large datasets (and one line of code).

    Leave a comment:


  • Michael Stepner
    replied
    Repeating a minor wish list item that I've mentioned in person:

    I believe that my SSC program fastxtile produces identical output to the built-in program xtile, but runs much faster. If it became the default xtile program in Stata 14, that would speed up a whole host of programs that call xtile.

    Leave a comment:


  • Clyde Schechter
    replied
    There is another radical solution: make the stable option - default. Make another option (please don't call it fast, call it randomties or something like that to illustrate the point). Some programs will slow down, but there is no risk of misunderstanding.
    Well, I disagree with making the stable option the default. I agree with Stata Corp. that it is a good thing that sorting ties are broken randomly, resulting in indeterminacy of later calculations that are sensitive to the sort order. Anyone who is applying such calculations after an under-defined sort is generating indeterminate results--there are very, very few circumstances where this is not an error.. The -stable- option papers over the problem. With the current default you will at least realize what you have done and you can then either fully specify a sort key that uniquely identifies the observations, or switch your calculations to procedures that are insensitive to sort order (depending on which was the source of the error). If -stable- is made the default, most of these errors will go undetected for a very long time, and people may have already relied on the spurious results when it is discovered.

    What I would endorse, along Sergiy's line of reasoning, is to make it like, for example, -destring- which requires the specification of either the -generate()- or -replace- option. One could require that when -sort- is used, one must specify either -stable- or -randomties- as an option. At least the user is forced for a moment to think about the issue this way. I might prefer a different word than -stable-, which sounds desirable. Maybe -deceptivelystable-, or -sweepitundertherug-

    Of course, I have no idea whether this can be implemented in a way that does not break large amounts of legacy code. I imagine that -sort- is one of the most frequently used commands in ado files.

    Leave a comment:


  • Sergiy Radyakin
    replied
    Originally posted by daniel klein View Post
    Clyde certainly has a point, but I fear this behavior would require lots of quietly statements in (already written) ado-files, where you do not want such messages to appear, especially if the sort is not directly visible for the user...
    There is another radical solution: make the stable option - default. Make another option (please don't call it fast, call it randomties or something like that to illustrate the point). Some programs will slow down, but there is no risk of misunderstanding.

    Similar of a trap is the default float type. Every user converting time from string to a formatted number is writing gen time=..., without writing the type double., which as we know results in loss of precision and complaints of sort "Stata has lost my data". A few of other programs handling data either don't bother about types whatsoever, or provide a wide-enough default so that the user doesn't bother: SPSS, Excel, Limdep, NLogit, etc.

    Don't take me wrong, I love Stata's storage types. And each one of them is dear to me. But the double does look like a safer option than float to be selected as default.

    Best, Sergiy

    Leave a comment:


  • Lucas Mation
    replied
    Improvements to the do-file editor (after using RStudio, Stata's text editor becomes a pain...):
    - Autocompletion of closing parenthesis and quotation marks (even if as an opt-in option, not default)
    - Make syntax highlighting of macros ( `a' $a) work inside quotation marks
    I know I can use an editor of my choice, but these should come out of the box
    Last edited by Lucas Mation; 27 Mar 2015, 08:48.

    Leave a comment:


  • Richard Williams
    replied
    Sort's unstable sorting is wildly counter-intuitive but I have become convinced it is right. Explaining that in a simple warning message may be very difficult though.

    Leave a comment:


  • daniel klein
    replied
    Clyde certainly has a point, but I fear this behavior would require lots of quietly statements in (already written) ado-files, where you do not want such messages to appear, especially if the sort is not directly visible for the user. I would suggest making this point more salient in the help files, but on the other hand almost half the entry already explains the stable option's purpose with illustrating examples.

    Best
    Daniel

    Leave a comment:


  • Clyde Schechter
    replied
    Given the frequency with which we get posts on Statalist from people who have gotten irreproducible results because of a -sort- on a list of variables that do not uniquely identify the observations, it might make sense for the -sort- command to issue a warning like "The variable(s) in the sort key do not uniquely identify the observations; the resulting sorted order is not reproducible."

    Leave a comment:


  • Ronán Conroy
    replied
    Something that my students have found a little confusing is that the -over- option can be concealed under names like "categories" in the dialogues. Making sure all dialogues are consistent with Stata syntax and with each other would be helpful.

    I understand that there are plans afoot to revise the epidemiology commands, and I applaud this. The dialogues for some of these commands are bewildering, notably -tabodds- and -mcc-.

    And please, Statacorp, why is it necessary for the -tabulate- dialogue to refer to "within-column relative frequencies"? A relative frequency scaled 0-100 is a percentage. They are column percents, which is not only much easier for my poor students but also more precise.

    Leave a comment:


  • Clyde Schechter
    replied
    I have a request for the do-file editor. I wish that in its open and save functions it acted more like a part of Stata and less like an independent program. If I launch Stata by double-clicking on a data set, as I often do, Stata opens with the working directory set to the directory where that data set is located. Great! Now if I open the do-editor from within Stata, and then try to open a do-file, or if I start a new do-file and try to save it, the do-file editor doesn't seem to know what Stata's working directory is: it just remembers the directory it was last used in. So I have to then navigate to the directory I want. Maybe that's functional for some people--but for my workflow where data sets and the do-files that created and analyzed them are almost always in the same directory, it's a nuisance. Actually, it's more than a nuisance because sometimes I don't quite notice that the do-file editor is in the "wrong" directory and end up saving my do file there. Then, later on, I can't find it in the directory where I thought it would be and have to go searching around for it.

    Leave a comment:


  • Charlie Joyez
    replied
    Since I've no answers on the impossibility to compute odds-ratio after a nested logit (see my post here)
    I'd be grateful if Stata 14 could incorporate a ``or'' option after nested logits, in order for us to interpret properly interaction terms in explicative variables.

    Thanks
    Charlie

    Leave a comment:


  • leetaey
    replied
    1. Network analysis
    2. Machine learning
    3. Graph command export

    Leave a comment:


  • Sergiy Radyakin
    replied
    I'd second Matthew White's request regarding .stpr files but for a different reason: they are source files and are committed to source repositories, and as such must be versioned. Binary files are not versioned well, as we know. Having something in a text format similar to Visual Studio's project files would be better.
    Thank you, Sergiy.

    Leave a comment:


  • Jeph Herrin
    replied
    This is a big wish, but as long as we're wishing...

    I've been using MCMC estimation more and more often, and (as far as I can tell) Stata is largely limited to making calls to WinBUGs. I've been using Stan (or RStan, via R) and SAS' PROC MCMC, both of which are very powerful and generic, and each time I use either I wonder when Stata will have something similar.

    Leave a comment:


  • Carlos M. Urzúa
    replied
    It would be nice to have in Stata 14 at least one pseudo-random uniform number generator that has a very long period and a high order of equidistribution. The Mersenne Twister (due to Matsumoto and Nishimura) would be my first choice..

    Leave a comment:

Working...
X