Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Richard Williams
    started a topic Wish list for Stata 14

    Wish list for Stata 14

    As far as I can tell, nobody else has started a thread like this on the new board, so I figured I would go first. Having all the wishes compiled in a single thread might be helpful.

    My wishes include:

    Command specific help for margins (or else some FAQs). People are always asking why margins does or doesn't do this or that after some command. Panel data techniques seem especially problematic. The answer is usually that some option is not appropriate given the technique used. Stats geniuses might already realize this, but for the rest of us some sort of at least brief explanation would help.

    Better support for margins with multiple outcome commands. After commands like ologit and mlogit, you have to run a separate margins command for each outcome of the dependent variable. I'd like to have margins do it with one command.

    More powerful factor variables. I'd like to be able to specify more functions of independent variables, e.g. the log of a variable, the cube root, whatever. Then have margins realize that functions of the same variable are related to each other, e.g. the value of X is related to the value of the square root of X.

  • Mr Nute
    replied
    Originally posted by Alexander Wuttke

    My biggest wishes (for Stata 15 now) concern the Do-File Editor:
    Auto-Save
    and some kind of navigation, clickable anchors so that it´s easier to get to the segment I am looking for
    Seconded, almost two full years later. Maybe it's a user error issue in my case but Stata crashes too often to not have an auto-save option for the do-file editor.

    Leave a comment:


  • Evan Sommer
    replied
    Originally posted by Jonathan Horowitz View Post

    I second this, but also would like to see this implemented for non-SEM routines too. I realize this probably is a much bigger challenge than implementing it for SEM, but fiml is often the best way to handle missing data (ee: http://www.statisticalhorizons.com/w...ngDataByML.pdf) and it would be great to see it become standard.

    ...
    Over a year later and I third this! Especially for xtmixed.

    Leave a comment:


  • Alexander Wuttke
    replied
    My biggest wishes (for Stata 15 now) concern the Do-File Editor:
    Auto-Save
    and some kind of navigation, clickable anchors so that it´s easier to get to the segment I am looking for

    Leave a comment:


  • László Sándor
    replied
    Re some earlier posts (and no new topic about Stata 15), some easily distributed data science our community could catch up to:
    http://amplab-extras.github.io/SparkR-pkg/
    https://spark.apache.org/sql/
    Some of you might also enjoy the last two episodes on this podcast: http://www.rce-cast.com/

    Leave a comment:


  • László Sándor
    replied
    By the way, there is some interesting discussion in the research computing (High Performance Computing) community about wasted opportunities, jealousy and Not Invented Here: http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/

    I guess Stata/MP is close to MPI.

    Leave a comment:


  • Sergio Correia
    replied
    Since both areg and xtreg_fe are built on top of _regress, that would be extremely unlikely. However, I remember past concerns about how xtreg_fe was *much* slower than areg, so the manual is probably referring to that.

    What we really need (or at least I need) is
    a) a faster way to manipulate data (collapse, egen, tabulate, merge and sort are simply too slow compared to e.g. plyr).
    b) low-level commands that allow users to improve Stata.

    I'm not sure if Statacorp can keep up with the OSS alternatives by itself. Ten years ago, ggplot, lpyr, pandas, scipy, julia, etc. were not a thing, and now they each have some really nice features that I wish I could use in Stata, but can't. There are still strong reasons for preferring e.g. Stata to R, but at some point the cons may outweight the pros.

    (Also, while I'm in rant mode, is there a way to fix the forum? error messages, double posting, etc. make this quite hard to use)

    Leave a comment:


  • László Sándor
    replied
    Is -xtreg, fe- faster now than -_areg-, Sergio Correia? Unlikely, right?
    Originally posted by Sergio Correia View Post
    Anyone knows what the internal xtreg changes are?
    Code:
    xtreg, fe is now orders of magnitude faster when there are many panels, and there always are.
    (From http://www.stata.com/help.cgi?whatsnew13to14)

    Also, nice that there is programmatic PDF support!

    Leave a comment:


  • Sergio Correia
    replied
    Anyone knows what the internal xtreg changes are?
    Code:
    xtreg, fe is now orders of magnitude faster when there are many panels, and there always are.
    (From http://www.stata.com/help.cgi?whatsnew13to14)

    Also, nice that there is programmatic PDF support!

    Leave a comment:


  • Jonathan Horowitz
    replied
    It seems that most of the wish list items were either a) suggestions to make existing commands stronger or b) workflow-type issues. I've been browsing through the manual's "What's New" section and I can't find a whole lot in either of those categories. Maybe I'm just looking in the wrong place? The exception, though, seems to be survival models (which look stronger than they've ever been before).

    Leave a comment:


  • Jeff Pitblado (StataCorp)
    replied
    The postestimation manual entries and help files have been updated to include margins
    specific information. Here is a quick peek

    http://www.stata.com/help.cgi?mlogit...mation#margins

    Just like for predict, these manual entries now have a section for margins that details
    which statistics are supported by margins. Also mentioned is the default prediction.
    As you will notice for mlogit, margins now defaults to probabilities for each (all) outcomes.

    Here is a quick example.

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . mlogit rep turn trunk
    (output omitted)
    
    . margins
    
    Predictive margins                              Number of obs     =         69
    Model VCE    : OIM
    
    1._predict   : Pr(rep78==1), predict(pr outcome(1))
    2._predict   : Pr(rep78==2), predict(pr outcome(2))
    3._predict   : Pr(rep78==3), predict(pr outcome(3))
    4._predict   : Pr(rep78==4), predict(pr outcome(4))
    5._predict   : Pr(rep78==5), predict(pr outcome(5))
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
        _predict |
              1  |   .0289855    .017518     1.65   0.098    -.0053491    .0633201
              2  |    .115942   .0358862     3.23   0.001     .0456063    .1862778
              3  |   .4347826   .0563016     7.72   0.000     .3244335    .5451318
              4  |   .2608696   .0517527     5.04   0.000     .1594361    .3623031
              5  |   .1594203   .0394829     4.04   0.000     .0820353    .2368053
    ------------------------------------------------------------------------------

    Leave a comment:


  • Rich Goldstein
    replied
    Rich apparently thinks he's joking about a wish list for Stata 15; however, I believe that the company starts planning early and anything big needs to be wished for within the next 60 days (30 days even better) or there is a good chance it won't make it; now (well next few weeks) is the time to start wishing

    Leave a comment:


  • Richard Williams
    replied
    We'll have to start a wish list for Stata 15 thread. ;-) For my own part, I am very happy about some of the enhancements to the margins command. It looks like it will be much easier to use after multiple-outcome commands like ologit. I'll be curious to see if the documentation for margins has improved -- I've always thought it needed more command-specif help, e.g. the margins help for xtlogit should not be the same as the help for logit. People are always getting confused because margins isn't giving them what they expect.

    Leave a comment:


  • László Sándor
    replied
    Sergio Correia, the stats are fascinating, and your points are spot on.

    I'd add one more thing though: I think the distinction between data-wrangling and analysis is spurious, so it is small comfort that Stata is fast on the latter. It is unrealistic, impractical or even downright wasteful (esp. with Stata's memory model) to hope to generate every construct of the data you'd ever think of using, and then start analyzing it. Most variables (incl. dummies, interactions, or more complicated constructs like leave-out means) come and go during a developing analysis, and cannot just be kept on disk (from which it is slow to merge anyway), let alone in RAM. So Python is a substitute for the initial data import from text files, but barely all that comes after that.

    I would have thought StataCorp's marginal revenue would come from upgrades, not a new user picking up Stata for IRT. So selling more upgrade licenses (every cycle, or even at higher prices early on) because users cannot wait to get the latest performance improvements would sound like a reasonable business model to me.

    Leave a comment:


  • Sergio Correia
    replied
    Hi Lazlo,

    Was thinking the same about both the facebook post and about the infrastructure part.

    I have a bit of mixed feelings about this update, although I can see the business rationale. If you want to fight for the marginal customer, the "battle" will be fought over stuff like IRT that some fields may use a lot (not economics though).

    However, as an user I'm a bit underwhelmed. For me, there are two use cases for Stata. One, manipulate data. Two, run regressions. For the first case, I'm starting to use Python (or SQL, R, etc.) a lot more, as commands like reshape or collapse are extremely slow compared to what they can be. For instance, collapsing data onto a small dataset should only require two passes on the dataset, one to get the items on which we collapse, and another to compute the statistics (assuming count/mean/total). Instead, Stata is doing sorts which is O(N log N) and a lot slower. It is even less efficient memory-wise as I recall because it creates many things with doubles that I may not want as such.

    For the second use, regressions, Stata still has the lead over other programs, but that lead is narrowing. Moreover, the speed advantage that it enjoys over e.g. R in commands like -regress- dissapear as we use more higher-level commands. For instance, reghdfe is 50% slower than -lfe- (it's R alternative) and there are several *easy* ways to increase it's speed, but there is no easy way to write threads in Mata, or go down to C easily, or even CUDA. Thus, I end up out of options for speedups.

    I think Matthieu's benchmark is incredibly useful in noting this differences, which again, won't matter for the marginal consumers but will matter for us, as more advanced users.

    Best,
    Sergio

    PS: I was hoping at least for two-dataset-support, as that would allow users to code a few improvements by themselves, such as a collapse replacement.

    Leave a comment:

Working...
X