Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeating a minor wish list item that I've mentioned in person:

    I believe that my SSC program fastxtile produces identical output to the built-in program xtile, but runs much faster. If it became the default xtile program in Stata 14, that would speed up a whole host of programs that call xtile.

    Comment


    • Originally posted by Michael Stepner View Post
      If it became the default xtile program in Stata 14, that would speed up a whole host of programs that call xtile.
      +1 on that.

      There is in general a huge amount of speedup potential in many common functions. A quick glance at https://github.com/matthieugomez/benchmark-stata-r shows some of the main culrpits, including reshape, merge, and most of egen.

      Speaking of -merge-, could we have a -sortpreserve- option? Most of the time I do merge, it changes the sort order of the data, which I then have to undo afterwards. Currently, I'm just prefixing merge with a simple ado that does that, but I feel it should be an option as it saves a lot of time on large datasets (and one line of code).

      Comment


      • Speaking of -merge-, could we have a -sortpreserve- option? Most of the time I do merge, it changes the sort order of the data, which I then have to undo afterwards.
        How would that actually work? If there are observations in the -using- data set that are not matched, where do they go? What does it mean to "preserve" the sort order when the dataset itself is different?

        Comment


        • Originally posted by Clyde Schechter View Post

          How would that actually work? If there are observations in the -using- data set that are not matched, where do they go? What does it mean to "preserve" the sort order when the dataset itself is different?
          They would go at the end, which is i) already what sortpreserve does in this particular case, and ii) consistent with what would happen if you were to sort again by the initial sort variables (since the sort variables would be missing for the fraction of the dataset coming from -using-)


          Comment


          • To illustrate that they work the same, below is a do-file that does a merge where some obs. only exist in master, some only in using, and some in both. Notice how the datasets match in this case:

            Code:
            * Create using dataset
            clear
            set obs 5
            gen foreign = _n
            gen x = runiform()
            tempfile using
            save "`using'"
            
            * Load master dataset
            sysuse auto, clear
            sort price
            
            * Merge and sort the usual way
            merge m:1 foreign using "`using'"
            sort price
            datasignature
            local sig1 = r(datasignature)
            
            * Install sortpreserve
            net from https://raw.githubusercontent.com/sergiocorreia/stata-misc/master/
            net install sortpreserve
            
            * Alternative merge
            sysuse auto, clear
            sort price
            
            sortpreserve: merge m:1 foreign using "`using'"
            
            * Verify datasets match
            datasignature
            assert r(datasignature)=="`sig1'"
            
            exit
            Note: Things are different in the (quite rare) -match update- and -match conflict- cases. In those cases, I would just do it the normal way.

            Comment


            • Easier output export to Words will the best Stata could do.
              There are a number of commands to make this easier. But, I hope one day we can export our results more easily through simple commands/menus. Stata can have a few templates for export and styling of the tables according to the formats common among journals. Of course, there are too many possible formats and styles, but perhaps Stata can cover the most common ones.

              Comment


              • but perhaps Stata can cover the most common ones.
                That's actually a lot harder then it seems. Stata users are dispersed over numerous disciplines: public health, clinical medicine, biomedicine, econometrics, accounting, finance, sociology, demography, psychology--just to name a few that come quickly to mind. Each discipline has its own journals with their own preferred styles. At most Stata might be able to set up output templates for one or two in each of these disciplines--even that seems unrealistic. This would probably leave pretty much nobody satisfied.

                Stata is a statistics program, not a word processing or document editing program. Trying to give it the features of the latter will inevitably turn it into bloatware. In addition, people wanting to use those features would have to learn new commands or menus to use them--while they still would need to know how to do all the corresponding manipulations in a real word processing/document editing program.

                What I think would be desirable is if the output produced by the ordinary Copy and Copy Table maneuvers were more layout-friendly to word processing programs, formatted so as to make it simple to paste from the Results window into a template table already created in a word processor or spreadsheet. I believe that is the intent of the Copy Table command, but the implementation is flawed, particularly as applied to commands that date back to the earliest versions of Stata.

                Remember, too, that there are several user-written programs that will very flexibly lay out and format the output of estimation commands (outreg2, esttab, estout, etc.). Although I personally don't use them, from the comments seen on this forum, it appears that they can meet most users' needs, though sometimes they fall short or require awkward workarounds for special situations.

                Comment


                • ​Something that my students have found a little confusing is that the -over- option can be concealed under names like "categories" in the dialogues. Making sure all dialogues are consistent with Stata syntax and with each other would be helpful.

                  I understand that there are plans afoot to revise the epidemiology commands, and I applaud this. The dialogues for some of these commands are bewildering, notably -tabodds- and -mcc-.

                  And please, Statacorp, why is it necessary for the -tabulate- dialogue to refer to "within-column relative frequencies"? A relative frequency scaled 0-100 is a percentage. They are column percents, which is not only much easier for my poor students but also more precise.

                  Comment


                  • I wish that -by- understood that -sort- was meant. I know that -bysort- exists, but it seems to me to be otiose. -by- only works when the data are sorted, so it should sort the data when invoked. Since Stata knows the sort order of the data, redundant sorting isn't carried out anyway.

                    And I wish that -by- worked with all Stata commands.

                    And, finally, it would be good if -foreach- would echo the Stata commands it's executing in the way that -for:- used to. It can otherwise be difficult to figure out what went on. I like the idea that Stata output should include the precise command that generated each piece of output.

                    Comment


                    • I wish that by understood that sort was meant. I know that bysort exists, but it seems to me to be otiose. by only works when the data are sorted, so it should sort the data when invoked. Since Stata knows the sort order of the data, redundant sorting isn't carried out anyway.
                      But that itself would create a downside. The present syntax is not in use because StataCorp could not program it otherwise. It's important as far as is possible for many, many users that users know the current sort order and see when it is changed and only change it consciously. (I'd go so far as to speculate that panel datasets are by far the most common kind now in use.) Some large fraction of my posts here hinge on showing how subscripting, itself entirely dependent on observation order, is key to many manipulations.

                      If this request were implemented, then

                      1. It would have to be under version control.

                      2. We get a new kind of question on Statalist: why did my sort order change? Or more likely why I do get these bizarre results (which turn out to be a consequence of a change in sort order).

                      3. We get a new kind of question on Statalist: why do I need to change my sort order? Or more likely why I do get these bizarre error messages (which turn out to be a consequence of programs using the old syntax).

                      I take it RonĂ¡n is volunteering to handle all these questions personally!

                      More positively, bysort already does what is desired. It's just a strange and ugly name.

                      Comment


                      • And, finally, it would be good if -foreach- would echo the Stata commands it's executing in the way that -for:- used to.
                        I respectfully disagree here. foreach and forvalues are programming tools and as such heavily used within programs and ado-files, where output is not desirable. In fact, I hardly find myself in a situation where I would like to have all the commands in a loop echoed to the screen - except for debugging, in which case I can always set trace on to figure out what exactly Stata did in each iteration.

                        Best
                        Daniel

                        Comment


                        • What about relaxing the restriction that factor variables must have non-negative values. For example, in a clinical trial we might get several pre-randomization observations and then several post-randomization interventions. It is natural to designate a time variable with negative numbers for the pre-intervention observations and positive numbers for the post-intervention ones. So, for example, an observation obtained 2 weeks before randomization might have week = -2, and one obtained 3 weeks after might have week = 3. Currently, you can't use i.week in this circumstance. Evidently the workaround is to create a different variable that is re-centered so that 0 corresponds to the lowest value of week, and then slap a value label on that. But it would be more convenient if we could just use i.week for this.

                          Comment


                          • Originally posted by Clyde Schechter View Post
                            What about relaxing the restriction that factor variables must have non-negative values.
                            Completely agree with that, it's extremely annoying when you have pre/post dummies and end up having to add an arbitrary number to make it always positive.
                            It's also hard to work around becuase -fvrevar- is a built-in.

                            Comment


                            • A bit of a quibble, but an option
                              Code:
                              set default_date_display ISO_8601, permanently
                              or
                              Code:
                              set default_date_display "%tdCY-N-D", permanently
                              would be welcome.

                              It would affect such commands as
                              Code:
                               di "`c(current_date)'"
                              and
                              Code:
                              update
                              and
                              Code:
                              describe
                              and most important
                              Code:
                              translate , translator(smcl2ps) header(on)
                              translate , translator(smcl2pdf) header(on)
                              For the first few, either I can write wrapper workarounds or put up with it as I'm the only one typically seeing it.

                              But customers often see output, and they've grown to take compliance-to-standards as a given. My option here (header(off)) is to forgo pagination.

                              Comment


                              • Stata should stregthend nonparametric and semi-parametric methods, Markov switching model, time-varying coefficient model. All these models are widely used in emprical economics.

                                Comment

                                Working...
                                X