Wish list for Stata 14

László Sándor replied

07 Apr 2015, 08:56
So Stata 14 is announced today. I think last week's Stata facebook post revealed the main new features. Notice no performance improvements or infrastructure changes for big data:

The votes are coming in! Just a reminder, go cast your vote for which of the following features you would most like to see in the next version of Stata.
59.48% Bayesian analysis
31.90% Panel and multilevel survival models
28.45% Survey for multilevel models
24.14% Endogenous treatment effects
19.83% Treatment effects for survival models
18.10% Regression models for fractional data
18.10% Markov-switching models
15.52% Power and sample size for survival analysis...
13.79% IRT (item response theory)
13.79% Unicode
08.62% Balance diagnostics for treatment effects
08.62% Satorra-Bentler for SEM
07.76% Censored Poisson model
03.45% Small-sample inference for mixed models
Leave a comment:
Jonathan Horowitz replied

04 Apr 2015, 09:19
Originally posted by Richard Williams View Post

I would like to see much better support for Full Information Maximum Likelihood (fiml). Some Stata routines, e.g. SEM, provide some support for fiml (which Stata calls mlmv).

I second this, but also would like to see this implemented for non-SEM routines too. I realize this probably is a much bigger challenge than implementing it for SEM, but fiml is often the best way to handle missing data (ee: http://www.statisticalhorizons.com/w...ngDataByML.pdf) and it would be great to see it become standard.

I also second/third/fourth everyone who wants the error message to reference the line in the do file.

Finally, Satorra-Bentler for -gsem- would be outstanding.
Leave a comment:
Qunyong Wang replied

02 Apr 2015, 19:23
Stata should stregthend nonparametric and semi-parametric methods, Markov switching model, time-varying coefficient model. All these models are widely used in emprical economics.
Leave a comment:
Joseph Coveney replied

02 Apr 2015, 16:45
A bit of a quibble, but an option

Code:

set default_date_display ISO_8601, permanently

or

Code:

set default_date_display "%tdCY-N-D", permanently

would be welcome.

It would affect such commands as

Code:

di "`c(current_date)'"

and

Code:

update

and

Code:

describe

and most important

Code:

translate , translator(smcl2ps) header(on) translate , translator(smcl2pdf) header(on)

For the first few, either I can write wrapper workarounds or put up with it as I'm the only one typically seeing it.

But customers often see output, and they've grown to take compliance-to-standards as a given. My option here (header(off)) is to forgo pagination.
Leave a comment:
Sergio Correia replied

02 Apr 2015, 11:55
Originally posted by Clyde Schechter View Post

What about relaxing the restriction that factor variables must have non-negative values.

Completely agree with that, it's extremely annoying when you have pre/post dummies and end up having to add an arbitrary number to make it always positive.
It's also hard to work around becuase -fvrevar- is a built-in.
Leave a comment:
Clyde Schechter replied

02 Apr 2015, 11:33
What about relaxing the restriction that factor variables must have non-negative values. For example, in a clinical trial we might get several pre-randomization observations and then several post-randomization interventions. It is natural to designate a time variable with negative numbers for the pre-intervention observations and positive numbers for the post-intervention ones. So, for example, an observation obtained 2 weeks before randomization might have week = -2, and one obtained 3 weeks after might have week = 3. Currently, you can't use i.week in this circumstance. Evidently the workaround is to create a different variable that is re-centered so that 0 corresponds to the lowest value of week, and then slap a value label on that. But it would be more convenient if we could just use i.week for this.
Leave a comment:
daniel klein replied

30 Mar 2015, 05:11
And, finally, it would be good if -foreach- would echo the Stata commands it's executing in the way that -for:- used to.

I respectfully disagree here. foreach and forvalues are programming tools and as such heavily used within programs and ado-files, where output is not desirable. In fact, I hardly find myself in a situation where I would like to have all the commands in a loop echoed to the screen - except for debugging, in which case I can always set trace on to figure out what exactly Stata did in each iteration.

Best
Daniel
3 likes
Leave a comment:
Nick Cox replied

30 Mar 2015, 04:59
I wish that by understood that sort was meant. I know that bysort exists, but it seems to me to be otiose. by only works when the data are sorted, so it should sort the data when invoked. Since Stata knows the sort order of the data, redundant sorting isn't carried out anyway.

But that itself would create a downside. The present syntax is not in use because StataCorp could not program it otherwise. It's important as far as is possible for many, many users that users know the current sort order and see when it is changed and only change it consciously. (I'd go so far as to speculate that panel datasets are by far the most common kind now in use.) Some large fraction of my posts here hinge on showing how subscripting, itself entirely dependent on observation order, is key to many manipulations.

If this request were implemented, then

1. It would have to be under version control.

2. We get a new kind of question on Statalist: why did my sort order change? Or more likely why I do get these bizarre results (which turn out to be a consequence of a change in sort order).

3. We get a new kind of question on Statalist: why do I need to change my sort order? Or more likely why I do get these bizarre error messages (which turn out to be a consequence of programs using the old syntax).

I take it Ronán is volunteering to handle all these questions personally!

More positively, bysort already does what is desired. It's just a strange and ugly name.
Leave a comment:
Ronán Conroy replied

30 Mar 2015, 04:22
I wish that -by- understood that -sort- was meant. I know that -bysort- exists, but it seems to me to be otiose. -by- only works when the data are sorted, so it should sort the data when invoked. Since Stata knows the sort order of the data, redundant sorting isn't carried out anyway.

And I wish that -by- worked with all Stata commands.

And, finally, it would be good if -foreach- would echo the Stata commands it's executing in the way that -for:- used to. It can otherwise be difficult to figure out what went on. I like the idea that Stata output should include the precise command that generated each piece of output.
Leave a comment:
Ronán Conroy replied

30 Mar 2015, 04:17
Something that my students have found a little confusing is that the -over- option can be concealed under names like "categories" in the dialogues. Making sure all dialogues are consistent with Stata syntax and with each other would be helpful.

I understand that there are plans afoot to revise the epidemiology commands, and I applaud this. The dialogues for some of these commands are bewildering, notably -tabodds- and -mcc-.

And please, Statacorp, why is it necessary for the -tabulate- dialogue to refer to "within-column relative frequencies"? A relative frequency scaled 0-100 is a percentage. They are column percents, which is not only much easier for my poor students but also more precise.
Leave a comment:
Clyde Schechter replied

28 Mar 2015, 18:54
but perhaps Stata can cover the most common ones.

That's actually a lot harder then it seems. Stata users are dispersed over numerous disciplines: public health, clinical medicine, biomedicine, econometrics, accounting, finance, sociology, demography, psychology--just to name a few that come quickly to mind. Each discipline has its own journals with their own preferred styles. At most Stata might be able to set up output templates for one or two in each of these disciplines--even that seems unrealistic. This would probably leave pretty much nobody satisfied.

Stata is a statistics program, not a word processing or document editing program. Trying to give it the features of the latter will inevitably turn it into bloatware. In addition, people wanting to use those features would have to learn new commands or menus to use them--while they still would need to know how to do all the corresponding manipulations in a real word processing/document editing program.

What I think would be desirable is if the output produced by the ordinary Copy and Copy Table maneuvers were more layout-friendly to word processing programs, formatted so as to make it simple to paste from the Results window into a template table already created in a word processor or spreadsheet. I believe that is the intent of the Copy Table command, but the implementation is flawed, particularly as applied to commands that date back to the earliest versions of Stata.

Remember, too, that there are several user-written programs that will very flexibly lay out and format the output of estimation commands (outreg2, esttab, estout, etc.). Although I personally don't use them, from the comments seen on this forum, it appears that they can meet most users' needs, though sometimes they fall short or require awkward workarounds for special situations.
Leave a comment:
Navid Asgari replied

28 Mar 2015, 17:02
Easier output export to Words will the best Stata could do.
There are a number of commands to make this easier. But, I hope one day we can export our results more easily through simple commands/menus. Stata can have a few templates for export and styling of the tables according to the formats common among journals. Of course, there are too many possible formats and styles, but perhaps Stata can cover the most common ones.
Leave a comment:

Sergio Correia replied

28 Mar 2015, 16:29

To illustrate that they work the same, below is a do-file that does a merge where some obs. only exist in master, some only in using, and some in both. Notice how the datasets match in this case:

Code:

* Create using dataset
clear
set obs 5
gen foreign = _n
gen x = runiform()
tempfile using
save "`using'"

* Load master dataset
sysuse auto, clear
sort price

* Merge and sort the usual way
merge m:1 foreign using "`using'"
sort price
datasignature
local sig1 = r(datasignature)

* Install sortpreserve
net from https://raw.githubusercontent.com/sergiocorreia/stata-misc/master/
net install sortpreserve

* Alternative merge
sysuse auto, clear
sort price

sortpreserve: merge m:1 foreign using "`using'"

* Verify datasets match
datasignature
assert r(datasignature)=="`sig1'"

exit

Note: Things are different in the (quite rare) -match update- and -match conflict- cases. In those cases, I would just do it the normal way.

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: