Wishlist for Stata 16

Nick Cox replied

27 Sep 2017, 02:42
Clyde's point, I imagine, is that using variables and also an if qualifier would not be consistent with other syntax. Consider

Code:

summarize x y if foo == 42

There is no problem here because selecting on variables and selecting on observations are just different ways of selecting data and compatible with each other.

But

Code:

keep x y

Code:

keep if foo == 42

select data in different dimensions: you can keep (or drop) variables OR you can do that to observations. As far as Stata is concerned

Code:

keep x y if foo == 42

could only mean selecting in both directions at once, which at its simplest is cutting a rectangular hole in the dataset and throwing the data away, which Stata doesn't allow. .

I (we) understand that you want

Code:

keep x y if foo == 42

to be acceptable short-hand for

Code:

keep x y keep if foo == 42

but I too would support StataCorp never, ever allowing that. Consider that

Code:

summarize x y summarize if foo == 42

are not at all equivalent to

Code:

summarize x y if foo == 42

But you can program this for yourself! Something like

Code:

* not recommended by its author program nigelkeep syntax [varlist] [if] if `"`varlist'`if'"' == "" error 198 if "`varlist'" != "" keep `varlist' if `"`if'"' != "" keep `if' end
1 like
Leave a comment:
Nigel Moore replied

27 Sep 2017, 01:08
Originally posted by Clyde Schechter View Post

Re #20: I don't think that would be a good idea. It would make the semantics of -if- qualifiers different for just the -keep- and -drop- commands. The -if- qualifier is very specific: it identifies a subset of the observations to which the command will be applied. That's not relevant to -keep-ing and -drop-ing variables.

I'm not sure that I follow your logic, Clyde. Using the -if- qualifier with -keep- and -drop- limits those commands to the subset just like any other command.
Leave a comment:
Richard Williams replied

24 Sep 2017, 10:54
Originally posted by Bruce Weaver View Post

Making nestreg support factor variable operators would be a nice improvement, IMO. Is there some technical reason why this is not currently supported?

Cheers,
Bruce

I agree. But I'll note that, much a I hate hate hate the old xi: prefix, you could use it with nestreg if you want to avoid creating dummies and interaction yourself.
Leave a comment:
Bruce Weaver replied

24 Sep 2017, 10:43
Making nestreg support factor variable operators would be a nice improvement, IMO. Is there some technical reason why this is not currently supported?

Cheers,
Bruce
Leave a comment:
Clyde Schechter replied

24 Sep 2017, 09:35
Re #20: I don't think that would be a good idea. It would make the semantics of -if- qualifiers different for just the -keep- and -drop- commands. The -if- qualifier is very specific: it identifies a subset of the observations to which the command will be applied. That's not relevant to -keep-ing and -drop-ing variables. Some more flexible syntax for specifying varlists beyond what is afforded by wildcards might be useful, for this and other purposes, but overloading -if- to do it would, in my view, be an invitation to trouble.

Re #21:

1) See https://www.statalist.org/forums/for...-inside-a-loop for a way around this limitation. You still have to do 50 read operations, but they do not read the entire data set each time.

2) If the subset of the using data that will be needed for any given observation can be specified by an interval to which the value of a variable belongs, then Robert Picard's -rangejoin- command will handle this, and it is much faster than -merge-. See also Sergio Correa's -ftools- package, which deals with the speed issue. (I don't know if it also deals with the memory issue.)

3) Yes, I agree. This behavior is quite unexpected, and the syntactic acceptance of a vector where only a scalar is admissible leads unwary users down the garden path. More important than the number of people who post here because of their "inexplicable" results from doing this, is the possibly larger number of people who are producing garbage results and are completely unaware of it! On the other hand, Stata has always been fairly generous in allowing people to take shortcuts in syntax that reduce typing. I suspect there is a fair amount of legacy code out there that relies on this convention and would break if this change were made. Perhaps implementing this prohibition, but allowing it under version control would be a reasonable compromise.

4) Why isn't -quietly- the solution to this?

5) Yes, that might be helpful.

6) No comment. Not relevant to my work, but I can see where this would be useful to others. On the other hand, I wonder if there are unintended consequences.

Last edited by Clyde Schechter; 24 Sep 2017, 09:38.
1 like
Leave a comment:
Daniel Feenberg replied

24 Sep 2017, 07:50
I have several requests:

1) All input and output commands should allow subsetting by variable or observation. I find it frustrating that to
save a national file into 51 separate files (one for each state) I have to reread the national file 50 times.

2) Merge is very slow compared in use - order(s) of magnitude slower. Also, it would be nice if the user could (optionally)
specify the amount of memory required for the result, so that cases where the Stata estimate is larger than available memory,
but smaller than the true requirement, could still be run. We run into this at NBER because typically only a tiny fraction
of a very large dataset is kept.

3) An attempt to use the -if- command with a variable should return an error (or at least a warning) rather than using the
first observation.

4) Provide a way to suppress the "NNN values changed" message in do files, without suppressing actual error messages..

5) Provide a way to add the variable name to the "NNN values changed" message.

6) Allow http uploads as well as downloads. We operate a remote SAAS that calculates income tax liability. Currently
we provide an ado file (-taxsim9.ado-) that invoke ftp to upload the data, but that is a hack.
2 likes
Leave a comment:
Nigel Moore replied

18 Sep 2017, 13:11
Originally posted by Jesse Wursten View Post

It would be nice (but not essential) if the keep command had an "order" option, which orders the variables you keep in the order you specify them in the command. This is of course trivial to implement as a user-written command, but it's something I find myself doing quite regularly. Doing it manually every time is a bit inconvenient because the actual variables to keep might change regularly and then it's a hassle to change it in both. Using a local varlist seems overkill.

I recently found myself trying to use

Code:

keep varlist if exp

to avoid two -drop- commands (one to drop the unwanted variables, the other to truncate the desired variables to a subset.) Being able to mix varlist and if options with -keep- and -drop- would be nice.
2 likes
Leave a comment:
Jesse Wursten replied

15 Sep 2017, 06:37
It would be nice (but not essential) if the keep command had an "order" option, which orders the variables you keep in the order you specify them in the command. This is of course trivial to implement as a user-written command, but it's something I find myself doing quite regularly. Doing it manually every time is a bit inconvenient because the actual variables to keep might change regularly and then it's a hassle to change it in both. Using a local varlist seems overkill.

Last edited by Jesse Wursten; 15 Sep 2017, 06:53.
3 likes
Leave a comment:
Belinda Foster replied

11 Sep 2017, 07:05
In addition, to the various enhancements that others have suggested for the do file editor, i would really like to see permanent bookmarks. Every time i close Stata or if there is a crash, all my bookmarks disappear, which is very annoying as it is very time-consuming to set them all up again. And that's assuming i can remember them all!
Leave a comment:
Nigel Moore replied

03 Sep 2017, 15:08
Originally posted by Nigel Moore View Post

AFAIK, the two Dunnet's supported by Stata require balanced data sets, certainly Theresa Powell's does)

Confirmed, -pwcompare, mcompare(dunnett)- does require balanced datasets.

In the biological/medical sciences we may well start off with balanced designs, but we often end up with unbalanced datasets. Other software packages can deal with this. In 2017 why cannot Stata? It's pretty lame.
1 like
Leave a comment:
Cynthia Inglesias replied

02 Sep 2017, 03:54
It will be very helpful if the next version of Stata provides easy access to operating system metrics (through c-class values perhaps). For example:

Code:

- the number of monitors - the resolution of each monitor - an indicator about whether a Stata window is maximised - the Stata results window dimensions - the number of monitors that the Stata results window spans across

It would also be great if the programmer can dynamically adjust the dimensions of the Stata results window and can set where a dialog box appears on screen.
Leave a comment:
Richard Williams replied

26 Aug 2017, 06:36
Cynthia Inglesias, I'll add that Stata/MP doesn't seem to speed up sem, at least not the sem models I run. Indeed, when I run sem on these monster UNIX machines, tasks with enormous data sets run much faster but sem doesn't. As I understand it, the UNIX machines have enormous amounts of memory but their processors are no faster than my desktop, maybe even slower.

So yes, it would be great if Stata were faster with large data sets. But there are other programs, such as sem, which also would be nice to speed up, presumably by better algorithms. (I repeat my request that Stata buy out MPlus or else figure out how to reverse-engineer whatever it is it does to zip fast the competition -- not only Stata, but perhaps everybody else.)
Leave a comment:
Cynthia Inglesias replied

26 Aug 2017, 05:05
Richard Williams Recently, there was an interesting presentation by Sergio Correia regarding speeding up inefficient Stata commands: https://www.stata.com/meeting/baltim...17_Correia.pdf

StataCorp should finally implement these suggestions in Stata. I understand that the company has certain priorities but the latest version was a disappointment with regards to features relating to speed improvements and mata. Not everyone has access to Stata MP and datasets become increasingly bigger. Asking people to buy Stata MP for data manipulation is ridiculous.
1 like
Leave a comment:
Dario Maimone Ansaldo Patti replied

24 Aug 2017, 20:31
I would like to see more models to be Estimated using bayes. In particular, i would like to estimate bayesian spatial models using stata instead of being forced to use Matlab or R
Leave a comment:
Andrew Wade replied

24 Aug 2017, 19:24
Hi,
Having just had troubles replicating results first generated in SAS....I suggest that the egen suite of commands have the option of using weights.
This may of course not be relevant for all of the egen commands. But it was for what I was doing...using the std command.
Regards,
Andrew
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: