Wishlist for Stata 17

Clyde Schechter replied

07 Jan 2021, 14:27
The do-file editor, at least in Windows, allows you to position the cursor somewhere in the file, and then type ctrl+D, and Stata will then execute the file from the line in which the cursor is located on down to the end. That's often convenient.

What I find I need to do more often, however, is the reverse. It would be nice to have a keyboard shortcut that would allow me to place the cursor at a desired stopping point, type the shortcut and have Stata respond by starting at the top of the do-file and continuing down to where the cursor is, and then halt.
1 like
Leave a comment:
Jeff Wooldridge replied

05 Jan 2021, 12:10
I think almost all commands should have the option of computing cluster-robust variance matrix estimators -- and these allow for various kinds of heteroskedasticity as well. There are two kinds of commands where Stata does not allow either vce(robust) or vce(cluster id).

1. Commands where they clearly should, because the estimators are consistent with general forms of cluster correlation (including serial correlation) and heteroskedasticity. The commands -sureg- and -reg3- fit into this category.

2. Commands where the need for vce(robust) or vce(cluster id) is an admission that the estimators are consistent because the need for a robust variance matrix violates the underlying assumptions needed for consistency. xtlogit with the fe option and xttobit (which does RE tobit) are two examples. Somewhat puzzling is that xtlogit with the re option does allow for a full variance sandwich estimator but xttobit with the RE option does not. Technically, both estimators are inconsistent if anything about the model is misspecified: including serial correlation. (Contrast xtreg and xtpoisson, which are fully robust to serial correlation and any form of heteroskedasticity). I still prefer allowing computation of robust standard errors even when the parameter estimators are inconsistent. After all, all models are approximations to the truth. We should compute standard errors that properly account for the sampling uncertainty.

3. Related to point (1) is that I think the -gmm- command should allow a weighting matrix that leads to the GMM version of three stage least squares. This estimator can have better small-sample properties than GMM with an unrestricted weighting matrix. Of course, one would allow vce(robust) and vce(cluster id) options.
4 likes
Leave a comment:
Clyde Schechter replied

05 Jan 2021, 10:34
re: 415. Rich, thanks for that clarification. It may be that my perception that -mixed- is not slow is because I actually end up having to use -melogit- more often than -mixed-, and by comparison, -mixed- is greased lightning and really easy to get to converge.
Leave a comment:
Rich Goldstein replied

05 Jan 2021, 06:01
re: #'s 413 and 414: there is evidence that Stata is slower than competitors and, to some extent, less likely to converge: see McCoach, DB, et al. (2018), "Does the package matte? A comparison of five common multilevel modeling software packages", Journal of educational and behavioral statistics, 43 (5): 594-627; re: speed, p. 620 says, "In terms of computational speed, Stata was by far the slowest of the software programs, and the difference was not trivial "; the situation is not as clear re: convergence but there is clearly a problem; note that I sent a pre-print of this to people at StataCorp and have, intermittently, been in touch; while I am assured that work on these issues is ongoing, it is still the case that Stata is slow for many of these models (and the same appears to be true for SEM/GSEM though I know of no actual comparative data for these); there is some evidence of convergence problems at "boundaries" (e.g., random effects near zero)
4 likes
Leave a comment:
Clyde Schechter replied

04 Jan 2021, 14:37
Re #413. As one who frequently fits random slopes models, I agree with the importance of this. But I have to say that my experience with -mixed- has been that it runs well and estimates these models quickly, even with large data sets. (That's a huge contrast with, for example, -melogit-, which can take very long times to fit even simple models in large data sets.) I wonder what accounts for the difference in our experiences with this same command.
Leave a comment:
paulvonhippel replied

04 Jan 2021, 14:30
I think the -mixed- command needs to be optimized to work with random slopes. Although it nominally supports random slopes, in my experience it takes a very long time, or fails to converge, with all but the simplest models. I have found -mixed- to be practically useless for random slope models. -meglm- is no better.

The problem of fitting random slope models is computationally tractable. For example, HLM software runs them very quickly. So I'm sure Stata can do better if it makes these models a priority.
2 likes
Leave a comment:
daniel klein replied

04 Jan 2021, 13:22
Originally posted by Andrew Lover View Post

Absolutely agree, but does that imply that best practice is:

Code:

sort x y z _all

?

No. I guess best practice is

Code:

sort all_variables_that_uniquely_identify_the_sort_order_that_I_want_for_whatever_reason

The point is that if a certain result (not necessarily restricted to estimation) depends on the sort order of the dataset, then all variables that define, i.e., carry substantive meaning for that particular sort order should be spelled out explicitly. We should know why a result depends on a particular sort order and the variables that we list should convey that information. Thus, I would extend my initial suggestion to

Code:

sort all_variables_that_uniquely_identify_the_sort_order_that_I_want_for_whatever_reason_and_no_variables_that_are_irrelevant_for_what_I_want

Edit:

Of course, others have written about this topic some time ago (e.g., Schumm 2006).

Schumm, L. P. 2006. Stata tip 28: Precise control of dataset sort order. The Stata Journal, 6(1), pp. 144--146.
Last edited by daniel klein; 04 Jan 2021, 13:45.
4 likes
Leave a comment:
Andrew Lover replied

04 Jan 2021, 12:54
Originally posted by daniel klein View Post

I strongly disagree. The variables you list in the sort command should be both sufficient and necessary to ensure a unique sort order. If the sort order depends on something that is not recorded in your dataset, the dataset is flawed, not the sort command.

Absolutely agree, but does that imply that best practice is:

Code:

sort x y z _all

?

(Genuinely curious here).
Leave a comment:
David Roodman replied

04 Jan 2021, 10:35
Or, further to my earlier comment if Stata Corp will not add a suite of functions for efficient operations on matrices (which, really should be easy, yes?) then give us a way to do it by allowing/documenting the writing of plugins for Mata.
3 likes
Leave a comment:
daniel klein replied

04 Jan 2021, 00:11
Originally posted by Andrew Lover View Post

Make stable the default for -sort-.

I strongly disagree. The variables you list in the sort command should be both sufficient and necessary to ensure a unique sort order. If the sort order depends on something that is not recorded in your dataset, the dataset is flawed, not the sort command.

Last edited by daniel klein; 04 Jan 2021, 00:16.
5 likes
Leave a comment:
Andrew Lover replied

03 Jan 2021, 16:59
Make stable the default for -sort-.

Not using

Code:

sort x y z, stable

can be a source of major headaches.

The manual says the following:

Which is a bit perplexing. When general processor speed was a major constraint this made sense; and for those with massive datasets there could be a "fast" option that would have all the caveats of a m:m merge?

Attached Files
1 like
Leave a comment:
John Mullahy replied

01 Jan 2021, 10:03
Might v17 consider adding some additional –margins– features for –gmm–?

Specifically, given a gmm command

Code:

gmm (residual equation),...

might it be possible to allow post-estimation options like

Code:

margins, dydx(*)

where the derivatives are taken with respect to the x's in the residual equation identified by the variables(…) option (i.e. what is returned by e(rhs))?

In some cases this may result in nonsense, but in others it can be informative. E.g. consider

Code:

gmm (y-exp({xb: x1 x2 _cons})), var(x1 x2) instr(x1 x2)

The derivatives of r=y-exp({xb: x1 x2 _cons}) with respect to (x1,x2) where the x's appear explicitly in the residual function, i.e. (d/dx)(-exp(xb)), may imaginably be of interest.

Note: This is beyond simply dydx(*) with respect to the linear predictor.
Last edited by John Mullahy; 01 Jan 2021, 10:06. Reason: Edited for clarity.
Leave a comment:
Manish Srivastava replied

29 Dec 2020, 21:23
gsem is potentially a powerful tool-- it can handle a wide variety to problems very easily. But it's terribly slow, and many times fail to have initial values even though those models can be run using alternative commands.
1 like
Leave a comment:
John Mullahy replied

18 Dec 2020, 10:51
Would it be possible to allow Stata's nonlinear regression procedure nl to accommodate factor variables?
3 likes
Leave a comment:
Radhouene DOGGUI replied

18 Dec 2020, 08:54
I wish that we could have a package or command for latent transition analysis as well as for the causal mediation analysis with multiple mediators.
Best regards,
Radhouene
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: