Wishlist for Stata 17

Nick Cox replied

02 Apr 2021, 07:17
#491, #493 John Mullahy Please post that also as a separate thread. (The duplication is I think justified here, as a suggestion for Stata 17 is also of interest to many people regardless of whether Stata 17 turns out to include it or they upgrade to Stata 17.)

I have other suggestions that don't depend on what StataCorp does.
3 likes
Leave a comment:
John Mullahy replied

02 Apr 2021, 06:06
Re my posting #491: Using

Code:

twoway scatteri .4 0 .3 1 .2 2 .1 3, recast(bar) ysc(r(0 .4))

delivers pretty much what's desired. That said, a dedicated –twoway histi– command would still be nice to have.
2 likes
Leave a comment:
Mark Davis replied

01 Apr 2021, 20:39
I thought of a few things recently I would love to see. Sorry if these have already been requested.
I am hoping for stata to be able to use the apple silicone's "neural network", AKA matrix math coprocessor, to speed up certain commands. I think there could be massive gains in MATA and commands that use MATA. I've taken a look at the current implementation of the neural net and the apple libraries allow you to send multiple linear regressions only. It doesn't look too open, but there is a lot of potential.

I also think it would be wise to create some sort of Mac OS version of the server software. Stata runs so much faster on apple silicone than x86 architecture and the apple chips have much better single processor performance than typical server chips whcih lower clock speeds for thermal reasons on high core count chips. Apple has some 8-32 performance core versions of their chips in development. I could see institutions with servers wanting to transition to apple silicone to get 33-100% better server performance per core. The rumor is apple is developing some sort of mac pro mini workstation.

lastly being able to load large CSVs in parallel. I feel like there must be an effective way of having multiple cores load a CSV file in parallel. It kills me when my eight core license is waiting for an hour while one core loads in the data. I'm sure the development team has thought of this before and there is a good reason, but parallel CSV reading for me would be a $100 feature for me. I think a lot of researchers using large healthcare claims data would appreciate this as well.

Please Stata Corp, don't make stata 17 great. I just bought a Stata 16 MP8 license to be able to use apple silicone and and I can't afford for you to come out with any must have features.
Last edited by Mark Davis; 01 Apr 2021, 21:21.
2 likes
Leave a comment:
John Mullahy replied

01 Apr 2021, 16:20
This seems so obvious that I've quite possibly missed something equivalent that already exists, but an immediate version of –twoway histogram– would often be great to have. For example:

Code:

twoway histi 0 .4 1 .3 2 .2 3 .1, [standard twoway hist options]

would produce bars of heights .4, .3, .2, .1 at x-values 0, 1, 2, 3.
Leave a comment:
wbuchanan replied

30 Mar 2021, 06:29
Clyde Schechter
I’ll definitely take a look at it, but it seems like it would still create a noticeable bottleneck to combine the results over a larger and larger number of replications and/or a large number of replications with a large number of variables/estimates to combine. If nothing else, I think the performance gains of supporting some SQL based operations for tasks like this could be useful/helpful (particularly since SQL engines have already solved this problem effectively).
Leave a comment:
Marc Kaulisch replied

30 Mar 2021, 01:34
Also a graph related feature wish. If a graph scheme sets an outline width (e.g. linewidth pbar) the width should be calculated proportionally to the bar height. At the moment it appears that the outline width is applied globally. In my case the outline was higher than the bar (see https://www.statalist.org/forums/for...29#post1599529). Thus I have to adjust the linewidth to none manually (or to have a second version of a scheme with linewidth pbar none), but when someone is not aware of this problem the bar height in a graph look disproportionaly.
Leave a comment:
FernandoRios replied

29 Mar 2021, 19:17
Hi Mead
Thank you for that trick!
My thoughts about legend were a bit more simplistic.
1. I would keep legend using the same syntax as it has right now. The biggest difference would be to count all the subelements of a figure the same way that they are currently counted when using "twoway"
For example, say I have 2 figures:
scatter y1 y2 x, name(m1)
scatter z1 z2 x, name(m2)
I could combine them as:
graph combine m1 m2, legend(order(1 "y1" 2 "y2" 3 "z1" 4 "z2"))

Right now, for example this is how it works with two way:
two scatter y1 y2 x || scatter z1 z2 x, legend(order(1 "y1" 2 "y2" 3 "z1" 4 "z2"))

In the second case, of course, all dots would be on the same figure.

2. I actually managed to do something like this using a 3rd plot, that also only contains the legend, but that could be set nicely on the bottom of the figures.

webuse iris

twoway scatter seplen sepwid if iris==1 || scatter seplen sepwid if iris==2 || scatter seplen sepwid if iris==3 , legend(off) name(m1, replace)
twoway scatter petlen petwid if iris==1 || scatter petlen petwid if iris==2 || scatter petlen petwid if iris==3, legend(off) name(m2, replace)

twoway scatter petlen petwid if iris==0 || scatter petlen petwid if iris==0 || scatter petlen petwid if iris==0 , ///
xlabel(,noticks nolabel) ylabel(,noticks nolabel) plotregion(lstyle(none)) ///
legend(order(1 "setosa" 2 "versicolor" 3 " virginica") symysize(50)) name(m3, replace) ///
graphregion(margin(none)) plotregion(margin(zero)) fysize(12) xtitle("") ytitle("")
graph combine m1 m2 m3, col(1) iscale(.9) scale(.8) ysize(14) xsize(12)
Leave a comment:
Mead Over replied

29 Mar 2021, 15:35
FernandoRios , in response to your suggestion in post #470 here,

[C]ould [it] also be possible to add the option "legend" for graph combine? (so it is easy to add a single legend to figures)

I wonder what the syntax might look like for a legend option on gr combine. For complete flexibility, the syntax should be able to pull any key from any of the component panels for use in the combined legend. Perhaps the order sub-command:

Code:

order(1 “Miles per gallon" 2 “Length in inches” 3 “Price in dollars”)

could be generalized to refer to specific keys in specific sub-graphs like this:

Code:

order(1.1 “Miles per gallon" 2.1 “Length in inches” 3.1 “Price in dollars”)

where 1.1, 2.1 and 3.1 refer to the first keys respectively in sub-graphs 1, 2 and 3.

I’m not sure this would be a good idea. In general, I think a multi-panel graph should only be used when it makes a point that could not be made in a single graph, and then the differences across the panels should be as small as possible. A virtue of using graph…, by(varname) to construct a multi-panel graph is the guarantee that the x- and y-axes are identical across all sub-graphs. When a user needs a multi-panel layout not possible with graph…, by(varname), the objective of maximizing readability suggests combining sub-graphs which all contain the same legend keys.

A premise of the utilities grc1leg and grc1leg2 is that the legend from one of the component sub-graphs contains all the keys necessary for a legend to the combined graph. However, having spent far too much time reverse engineering Vince Wiggins (StataCorp)' grc1leg, I was curious if my generalization of his program, grc1leg2 could be tweaked to combine keys from different subgraphs. Now, in version 1.42 of grc1leg2, I think I’ve found a way to do that with a newly added option hidelegendfrom. The "trick" I use for making a combined legend for a multi-panel combined graph having K panels is to make a K + 1st graph containing all the keys and then use its legend for the combined graph, while hiding the K + 1st graph from view. (Has someone already posted this "trick"?)

Code:

sysuse auto, clear set graph off twoway /// (scatter mpg weight, mcolor(blue)), /// name(panel1, replace) twoway /// (scatter length weight, mcolor(red)), /// name(panel2, replace) twoway /// (scatter price weight, mcolor(green)) /// (lfit price weight, lcolor(green)), /// name(panel3, replace) twoway /// This is the dummy graph from which we take the legend (scatter mpg weight, mcolor(blue)) /// (scatter length weight, mcolor(red)) /// (scatter price weight, mcolor(green)) /// (lfit price weight, lcolor(green)), /// name(panel4, replace) set graph on grc1leg2 panel1 panel2 panel3 panel4, /// title("Assemble the legend keys from different panels" "to construct the combined legend") /// xtob1title legendfrom(panel4) hidelegendfrom /// pos(4) ring(0) lxoffset(-10) lyoffset(15) /// name(grc1hide, replace)

Of course, one could achieve the same effect with Stata's gr combine by appropriately modifying the legend commands on the component sub-graphs and then using Stata's graph editor to hide panel4. But perhaps some users will appreciate using grc1leg2 to save some time and effort. Since these combined graphs always need to be tweaked, using grc1leg2's dialog to tweak the legend organization, position and offset options (and to apply common titles as appropriate) can be a particular time-saver. A Stata corporation legend option for gr combine, with their accompanying dialog, would make this discussion moot. But the syntax diagram might be unwieldy.
Last edited by Mead Over; 29 Mar 2021, 15:45.
2 likes
Leave a comment:
Clyde Schechter replied

29 Mar 2021, 11:12
Re #484. There is no official Stata command for appending frames, but Daniel Fernandes and Roger Newsom have written -frameappend- to do that; it is available from SSC.

The problem of working with people who have not yet upgraded to version 16 is a substantial one; I find myself in the same position at times. While it would be convenient for the old -post- to support strLs, I suspect that StataCorp will not want to invest time into implementing that. It is clear that frames are poised to largely replace -postfile-s, and in due time, all active Stata users will catch up to version 16 or beyond. In fact, even if they were to implement this as of version 17 (which is what this wishlist is about) anyone with access to that would have access to frames.
2 likes
Leave a comment:
wbuchanan replied

29 Mar 2021, 06:03
William Lisowski
The challenge with that is that you can’t yet append frames without writing them to disk as separate files (and I work with others who may not have Stata 16 yet). It isn’t clear why post isn’t able to allow strL types; in the worst case scenario, it could write all strings as strLs under the hood and then on post close it could try to optimize the storage either by collating the strLs or recasting them as a string with X characters.
Leave a comment:
William Lisowski replied

28 Mar 2021, 07:00
... post imposes a limit on the size of string variables ...

In Stata 16.1 from the output of help post we see

Note that newvarlist does not allow strL as the variable storage type. A similar utility that allows strL as a variable storage type is [P] frame post.
Leave a comment:
wbuchanan replied

28 Mar 2021, 06:22
I’d like to be able to put it into a variable (as well as other information). If it was possible to pass a string as an expression (which is legal based on the help file), it wouldn’t be difficult to add a variable:

Code:

statsby model=“my first model” _b _se...

it is possible do do that with post, but post imposes a limit on the size of string variables which then makes it more difficult to store other information useful for simulations (e.g., pseudorandom number generator state, etc...).
Leave a comment:
Nick Cox replied

28 Mar 2021, 05:26
#480 On statsby

Code:

label data "`e(cmd)'"

puts the command string into the metadata; where would you like it to go?
Last edited by sladmin; 29 Mar 2021, 08:05. Reason: close CODE tag
Leave a comment:
wbuchanan replied

26 Mar 2021, 08:15
I think it would also be nice to allow

Code:

post

, and it's related commands, to post strLs and for

Code:

statsby

to accept expressions that would store a string in the result (e.g., model=e(cmd) fails).
1 like
Leave a comment:
Niels Henrik Bruun replied

26 Mar 2021, 04:35
If the idea is to move code from Stata to Mata in commands such that mainly input validation is done in Stata code and the rest in Mata, it would nice error handling like the try-except concept were implemented in Mata. Repeating a very old grump
1 like
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: