Wish list for Stata 14

Michael Stepner

Join Date: Jul 2014

Posts: 56
#196

28 Mar 2015, 07:42

Repeating a minor wish list item that I've mentioned in person:

I believe that my SSC program fastxtile produces identical output to the built-in program xtile, but runs much faster. If it became the default xtile program in Stata 14, that would speed up a whole host of programs that call xtile.
1 like
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#197

28 Mar 2015, 11:02

Originally posted by Michael Stepner View Post

If it became the default xtile program in Stata 14, that would speed up a whole host of programs that call xtile.

+1 on that.

There is in general a huge amount of speedup potential in many common functions. A quick glance at https://github.com/matthieugomez/benchmark-stata-r shows some of the main culrpits, including reshape, merge, and most of egen.

Speaking of -merge-, could we have a -sortpreserve- option? Most of the time I do merge, it changes the sort order of the data, which I then have to undo afterwards. Currently, I'm just prefixing merge with a simple ado that does that, but I feel it should be an option as it saves a lot of time on large datasets (and one line of code).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#198

28 Mar 2015, 11:19

Speaking of -merge-, could we have a -sortpreserve- option? Most of the time I do merge, it changes the sort order of the data, which I then have to undo afterwards.

How would that actually work? If there are observations in the -using- data set that are not matched, where do they go? What does it mean to "preserve" the sort order when the dataset itself is different?
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#199

28 Mar 2015, 16:17

Originally posted by Clyde Schechter View Post

How would that actually work? If there are observations in the -using- data set that are not matched, where do they go? What does it mean to "preserve" the sort order when the dataset itself is different?

They would go at the end, which is i) already what sortpreserve does in this particular case, and ii) consistent with what would happen if you were to sort again by the initial sort variables (since the sort variables would be missing for the fraction of the dataset coming from -using-)
Comment

Sergio Correia

Join Date: Apr 2014
Posts: 420

#200

28 Mar 2015, 16:29

To illustrate that they work the same, below is a do-file that does a merge where some obs. only exist in master, some only in using, and some in both. Notice how the datasets match in this case:

Code:

* Create using dataset
clear
set obs 5
gen foreign = _n
gen x = runiform()
tempfile using
save "`using'"

* Load master dataset
sysuse auto, clear
sort price

* Merge and sort the usual way
merge m:1 foreign using "`using'"
sort price
datasignature
local sig1 = r(datasignature)

* Install sortpreserve
net from https://raw.githubusercontent.com/sergiocorreia/stata-misc/master/
net install sortpreserve

* Alternative merge
sysuse auto, clear
sort price

sortpreserve: merge m:1 foreign using "`using'"

* Verify datasets match
datasignature
assert r(datasignature)=="`sig1'"

exit

Note: Things are different in the (quite rare) -match update- and -match conflict- cases. In those cases, I would just do it the normal way.

Comment

Navid Asgari

Join Date: Sep 2025

Posts: 30
#201

28 Mar 2015, 17:02

Easier output export to Words will the best Stata could do.
There are a number of commands to make this easier. But, I hope one day we can export our results more easily through simple commands/menus. Stata can have a few templates for export and styling of the tables according to the formats common among journals. Of course, there are too many possible formats and styles, but perhaps Stata can cover the most common ones.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#202

28 Mar 2015, 18:54

but perhaps Stata can cover the most common ones.

That's actually a lot harder then it seems. Stata users are dispersed over numerous disciplines: public health, clinical medicine, biomedicine, econometrics, accounting, finance, sociology, demography, psychology--just to name a few that come quickly to mind. Each discipline has its own journals with their own preferred styles. At most Stata might be able to set up output templates for one or two in each of these disciplines--even that seems unrealistic. This would probably leave pretty much nobody satisfied.

Stata is a statistics program, not a word processing or document editing program. Trying to give it the features of the latter will inevitably turn it into bloatware. In addition, people wanting to use those features would have to learn new commands or menus to use them--while they still would need to know how to do all the corresponding manipulations in a real word processing/document editing program.

What I think would be desirable is if the output produced by the ordinary Copy and Copy Table maneuvers were more layout-friendly to word processing programs, formatted so as to make it simple to paste from the Results window into a template table already created in a word processor or spreadsheet. I believe that is the intent of the Copy Table command, but the implementation is flawed, particularly as applied to commands that date back to the earliest versions of Stata.

Remember, too, that there are several user-written programs that will very flexibly lay out and format the output of estimation commands (outreg2, esttab, estout, etc.). Although I personally don't use them, from the comments seen on this forum, it appears that they can meet most users' needs, though sometimes they fall short or require awkward workarounds for special situations.
Comment
Ronán Conroy

Join Date: Apr 2014

Posts: 14
#203

30 Mar 2015, 04:17

Something that my students have found a little confusing is that the -over- option can be concealed under names like "categories" in the dialogues. Making sure all dialogues are consistent with Stata syntax and with each other would be helpful.

I understand that there are plans afoot to revise the epidemiology commands, and I applaud this. The dialogues for some of these commands are bewildering, notably -tabodds- and -mcc-.

And please, Statacorp, why is it necessary for the -tabulate- dialogue to refer to "within-column relative frequencies"? A relative frequency scaled 0-100 is a percentage. They are column percents, which is not only much easier for my poor students but also more precise.
Comment
Ronán Conroy

Join Date: Apr 2014

Posts: 14
#204

30 Mar 2015, 04:22

I wish that -by- understood that -sort- was meant. I know that -bysort- exists, but it seems to me to be otiose. -by- only works when the data are sorted, so it should sort the data when invoked. Since Stata knows the sort order of the data, redundant sorting isn't carried out anyway.

And I wish that -by- worked with all Stata commands.

And, finally, it would be good if -foreach- would echo the Stata commands it's executing in the way that -for:- used to. It can otherwise be difficult to figure out what went on. I like the idea that Stata output should include the precise command that generated each piece of output.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35811
#205

30 Mar 2015, 04:59

I wish that by understood that sort was meant. I know that bysort exists, but it seems to me to be otiose. by only works when the data are sorted, so it should sort the data when invoked. Since Stata knows the sort order of the data, redundant sorting isn't carried out anyway.

But that itself would create a downside. The present syntax is not in use because StataCorp could not program it otherwise. It's important as far as is possible for many, many users that users know the current sort order and see when it is changed and only change it consciously. (I'd go so far as to speculate that panel datasets are by far the most common kind now in use.) Some large fraction of my posts here hinge on showing how subscripting, itself entirely dependent on observation order, is key to many manipulations.

If this request were implemented, then

1. It would have to be under version control.

2. We get a new kind of question on Statalist: why did my sort order change? Or more likely why I do get these bizarre results (which turn out to be a consequence of a change in sort order).

3. We get a new kind of question on Statalist: why do I need to change my sort order? Or more likely why I do get these bizarre error messages (which turn out to be a consequence of programs using the old syntax).

I take it Ronán is volunteering to handle all these questions personally!

More positively, bysort already does what is desired. It's just a strange and ugly name.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3890
#206

30 Mar 2015, 05:11

And, finally, it would be good if -foreach- would echo the Stata commands it's executing in the way that -for:- used to.

I respectfully disagree here. foreach and forvalues are programming tools and as such heavily used within programs and ado-files, where output is not desirable. In fact, I hardly find myself in a situation where I would like to have all the commands in a loop echoed to the screen - except for debugging, in which case I can always set trace on to figure out what exactly Stata did in each iteration.

Best
Daniel
3 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#207

02 Apr 2015, 11:33

What about relaxing the restriction that factor variables must have non-negative values. For example, in a clinical trial we might get several pre-randomization observations and then several post-randomization interventions. It is natural to designate a time variable with negative numbers for the pre-intervention observations and positive numbers for the post-intervention ones. So, for example, an observation obtained 2 weeks before randomization might have week = -2, and one obtained 3 weeks after might have week = 3. Currently, you can't use i.week in this circumstance. Evidently the workaround is to create a different variable that is re-centered so that 0 corresponds to the lowest value of week, and then slap a value label on that. But it would be more convenient if we could just use i.week for this.
Comment
Sergio Correia

Join Date: Apr 2014

Posts: 420
#208

02 Apr 2015, 11:55

Originally posted by Clyde Schechter View Post

What about relaxing the restriction that factor variables must have non-negative values.

Completely agree with that, it's extremely annoying when you have pre/post dummies and end up having to add an arbitrary number to make it always positive.
It's also hard to work around becuase -fvrevar- is a built-in.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#209

02 Apr 2015, 16:45

A bit of a quibble, but an option

Code:

set default_date_display ISO_8601, permanently

or

Code:

set default_date_display "%tdCY-N-D", permanently

would be welcome.

It would affect such commands as

Code:

di "`c(current_date)'"

and

Code:

update

and

Code:

describe

and most important

Code:

translate , translator(smcl2ps) header(on) translate , translator(smcl2pdf) header(on)

For the first few, either I can write wrapper workarounds or put up with it as I'm the only one typically seeing it.

But customers often see output, and they've grown to take compliance-to-standards as a given. My option here (header(off)) is to forgo pagination.
Comment
Qunyong Wang

Join Date: Apr 2015

Posts: 1
#210

02 Apr 2015, 19:23

Stata should stregthend nonparametric and semi-parametric methods, Markov switching model, time-varying coefficient model. All these models are widely used in emprical economics.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment