Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by Richard Williams View Post
    I would like to see much better support for Full Information Maximum Likelihood (fiml). Some Stata routines, e.g. SEM, provide some support for fiml (which Stata calls mlmv).
    I second this, but also would like to see this implemented for non-SEM routines too. I realize this probably is a much bigger challenge than implementing it for SEM, but fiml is often the best way to handle missing data (ee: http://www.statisticalhorizons.com/w...ngDataByML.pdf) and it would be great to see it become standard.

    I also second/third/fourth everyone who wants the error message to reference the line in the do file.

    Finally, Satorra-Bentler for -gsem- would be outstanding.

    Comment


    • So Stata 14 is announced today. I think last week's Stata facebook post revealed the main new features. Notice no performance improvements or infrastructure changes for big data:
      The votes are coming in! Just a reminder, go cast your vote for which of the following features you would most like to see in the next version of Stata.
      59.48% Bayesian analysis
      31.90% Panel and multilevel survival models
      28.45% Survey for multilevel models
      24.14% Endogenous treatment effects
      19.83% Treatment effects for survival models
      18.10% Regression models for fractional data
      18.10% Markov-switching models
      15.52% Power and sample size for survival analysis...
      13.79% IRT (item response theory)
      13.79% Unicode
      08.62% Balance diagnostics for treatment effects
      08.62% Satorra-Bentler for SEM
      07.76% Censored Poisson model
      03.45% Small-sample inference for mixed models

      Comment


      • Hi Lazlo,

        Was thinking the same about both the facebook post and about the infrastructure part.

        I have a bit of mixed feelings about this update, although I can see the business rationale. If you want to fight for the marginal customer, the "battle" will be fought over stuff like IRT that some fields may use a lot (not economics though).

        However, as an user I'm a bit underwhelmed. For me, there are two use cases for Stata. One, manipulate data. Two, run regressions. For the first case, I'm starting to use Python (or SQL, R, etc.) a lot more, as commands like reshape or collapse are extremely slow compared to what they can be. For instance, collapsing data onto a small dataset should only require two passes on the dataset, one to get the items on which we collapse, and another to compute the statistics (assuming count/mean/total). Instead, Stata is doing sorts which is O(N log N) and a lot slower. It is even less efficient memory-wise as I recall because it creates many things with doubles that I may not want as such.

        For the second use, regressions, Stata still has the lead over other programs, but that lead is narrowing. Moreover, the speed advantage that it enjoys over e.g. R in commands like -regress- dissapear as we use more higher-level commands. For instance, reghdfe is 50% slower than -lfe- (it's R alternative) and there are several *easy* ways to increase it's speed, but there is no easy way to write threads in Mata, or go down to C easily, or even CUDA. Thus, I end up out of options for speedups.

        I think Matthieu's benchmark is incredibly useful in noting this differences, which again, won't matter for the marginal consumers but will matter for us, as more advanced users.

        Best,
        Sergio

        PS: I was hoping at least for two-dataset-support, as that would allow users to code a few improvements by themselves, such as a collapse replacement.

        Comment


        • Sergio Correia, the stats are fascinating, and your points are spot on.

          I'd add one more thing though: I think the distinction between data-wrangling and analysis is spurious, so it is small comfort that Stata is fast on the latter. It is unrealistic, impractical or even downright wasteful (esp. with Stata's memory model) to hope to generate every construct of the data you'd ever think of using, and then start analyzing it. Most variables (incl. dummies, interactions, or more complicated constructs like leave-out means) come and go during a developing analysis, and cannot just be kept on disk (from which it is slow to merge anyway), let alone in RAM. So Python is a substitute for the initial data import from text files, but barely all that comes after that.

          I would have thought StataCorp's marginal revenue would come from upgrades, not a new user picking up Stata for IRT. So selling more upgrade licenses (every cycle, or even at higher prices early on) because users cannot wait to get the latest performance improvements would sound like a reasonable business model to me.

          Comment


          • We'll have to start a wish list for Stata 15 thread. ;-) For my own part, I am very happy about some of the enhancements to the margins command. It looks like it will be much easier to use after multiple-outcome commands like ologit. I'll be curious to see if the documentation for margins has improved -- I've always thought it needed more command-specif help, e.g. the margins help for xtlogit should not be the same as the help for logit. People are always getting confused because margins isn't giving them what they expect.
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            Stata Version: 17.0 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • Rich apparently thinks he's joking about a wish list for Stata 15; however, I believe that the company starts planning early and anything big needs to be wished for within the next 60 days (30 days even better) or there is a good chance it won't make it; now (well next few weeks) is the time to start wishing

              Comment


              • The postestimation manual entries and help files have been updated to include margins
                specific information. Here is a quick peek

                http://www.stata.com/help.cgi?mlogit...mation#margins

                Just like for predict, these manual entries now have a section for margins that details
                which statistics are supported by margins. Also mentioned is the default prediction.
                As you will notice for mlogit, margins now defaults to probabilities for each (all) outcomes.

                Here is a quick example.

                Code:
                . sysuse auto
                (1978 Automobile Data)
                
                . mlogit rep turn trunk
                (output omitted)
                
                . margins
                
                Predictive margins                              Number of obs     =         69
                Model VCE    : OIM
                
                1._predict   : Pr(rep78==1), predict(pr outcome(1))
                2._predict   : Pr(rep78==2), predict(pr outcome(2))
                3._predict   : Pr(rep78==3), predict(pr outcome(3))
                4._predict   : Pr(rep78==4), predict(pr outcome(4))
                5._predict   : Pr(rep78==5), predict(pr outcome(5))
                
                ------------------------------------------------------------------------------
                             |            Delta-method
                             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                    _predict |
                          1  |   .0289855    .017518     1.65   0.098    -.0053491    .0633201
                          2  |    .115942   .0358862     3.23   0.001     .0456063    .1862778
                          3  |   .4347826   .0563016     7.72   0.000     .3244335    .5451318
                          4  |   .2608696   .0517527     5.04   0.000     .1594361    .3623031
                          5  |   .1594203   .0394829     4.04   0.000     .0820353    .2368053
                ------------------------------------------------------------------------------

                Comment


                • It seems that most of the wish list items were either a) suggestions to make existing commands stronger or b) workflow-type issues. I've been browsing through the manual's "What's New" section and I can't find a whole lot in either of those categories. Maybe I'm just looking in the wrong place? The exception, though, seems to be survival models (which look stronger than they've ever been before).

                  Comment


                  • Anyone knows what the internal xtreg changes are?
                    Code:
                    xtreg, fe is now orders of magnitude faster when there are many panels, and there always are.
                    (From http://www.stata.com/help.cgi?whatsnew13to14)

                    Also, nice that there is programmatic PDF support!

                    Comment


                    • Is -xtreg, fe- faster now than -_areg-, Sergio Correia? Unlikely, right?
                      Originally posted by Sergio Correia View Post
                      Anyone knows what the internal xtreg changes are?
                      Code:
                      xtreg, fe is now orders of magnitude faster when there are many panels, and there always are.
                      (From http://www.stata.com/help.cgi?whatsnew13to14)

                      Also, nice that there is programmatic PDF support!

                      Comment


                      • Since both areg and xtreg_fe are built on top of _regress, that would be extremely unlikely. However, I remember past concerns about how xtreg_fe was *much* slower than areg, so the manual is probably referring to that.

                        What we really need (or at least I need) is
                        a) a faster way to manipulate data (collapse, egen, tabulate, merge and sort are simply too slow compared to e.g. plyr).
                        b) low-level commands that allow users to improve Stata.

                        I'm not sure if Statacorp can keep up with the OSS alternatives by itself. Ten years ago, ggplot, lpyr, pandas, scipy, julia, etc. were not a thing, and now they each have some really nice features that I wish I could use in Stata, but can't. There are still strong reasons for preferring e.g. Stata to R, but at some point the cons may outweight the pros.

                        (Also, while I'm in rant mode, is there a way to fix the forum? error messages, double posting, etc. make this quite hard to use)

                        Comment


                        • By the way, there is some interesting discussion in the research computing (High Performance Computing) community about wasted opportunities, jealousy and Not Invented Here: http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/

                          I guess Stata/MP is close to MPI.

                          Comment


                          • Re some earlier posts (and no new topic about Stata 15), some easily distributed data science our community could catch up to:
                            http://amplab-extras.github.io/SparkR-pkg/
                            https://spark.apache.org/sql/
                            Some of you might also enjoy the last two episodes on this podcast: http://www.rce-cast.com/

                            Comment


                            • My biggest wishes (for Stata 15 now) concern the Do-File Editor:
                              Auto-Save
                              and some kind of navigation, clickable anchors so that it´s easier to get to the segment I am looking for

                              Comment


                              • Originally posted by Jonathan Horowitz View Post

                                I second this, but also would like to see this implemented for non-SEM routines too. I realize this probably is a much bigger challenge than implementing it for SEM, but fiml is often the best way to handle missing data (ee: http://www.statisticalhorizons.com/w...ngDataByML.pdf) and it would be great to see it become standard.

                                ...
                                Over a year later and I third this! Especially for xtmixed.

                                Comment

                                Working...
                                X