Wishlist for Stata 16

Clyde Schechter replied

16 Apr 2019, 08:16
I would like to see a more logical and consistent approach to how -collapse- interacts with string variables. Currently, you can use string variables with (count), (first), (last), (firstnm) and (lastnm). That all makes sense. And it makes sense that you can't use them with numerical operators like (sum) or (mean) etc. Then there is the issue of ordering: strings have a natural alphabetic order. So it might make sense for Stata to also provide (min) and (max) operators for string variables under -collapse-. Or you could argue that such operators are probably not useful and not worth implementing: we rarely would need those, and could emulate them by -encode-ing first. But what Stata actually does is allow (min) but not (max)--which makes no sense to me.
Leave a comment:
Bill Magee replied

15 Apr 2019, 08:39
Perhaps include history of ado-file installs - under "What's New" in Help
Leave a comment:
Nick Cox replied

03 Apr 2019, 05:53
JanDitzen For unique read distinct! https://www.stata-journal.com/articl...article=dm0042

Allowing that would, I guess, help much less than you hope. It is what inside the loop that matters: in particular if there are any if qualifiers, they can slow things way down.

Check out the the Picardesque trio of rangestat, rangerun, runby (all from SSC) which do help with many of these problems. They are here now, and not dependent on the caprice and timetable of StataCorp.

But I don't have suggestions for your specific problem.

Robert Picard
1 like
Leave a comment:
Maarten Buis replied

03 Apr 2019, 04:39
You can have multiple by commands one after another, that is often how I solve such problems. The bigger problem is that comtrade does not seem byable, which would make sense given what it does.
Leave a comment:
JanDitzen replied

03 Apr 2019, 04:11
Maarten Buis, I am thinking about operations which require multiple lines of code. If I am not mistaken, by only applies to a single line. A recent example was that I wanted to download and process data from UN comtrade. To do it automatically, I first obtained a list of all countries and then looped over their codes. Within each loop I download the dataset and process it to bring it in the format I would like. A (simplified) example without the processing steps is the following (requires my comtrade user written command - https://janditzen.github.io/comtrade/):

Code:

clear cd "C:/downloads/comtrade" ** obtain list of all countries comtrade list partner , listall ** Remove world and all as not needed drop if value == "World" | value == "All" ** get number of countries for display use only levelsof id, local(CtryList) foreach ctry in `CtryList' { comtrade api, maxdata(500) type(C) freq(A) years(2017) reporterc(`ctry') partnerc(all) traderegime(all) hs(HS) cl(271111) append("`c(pwd)'/hs271111_`yr'.dta") nocheck ** more calculations here }
Leave a comment:
Maarten Buis replied

03 Apr 2019, 03:52
JanDitzen Any time I have been tempted (the last time was quite a while ago) to loop over distinct values, I have found that the solution was not to (explicitly) loop, but to use the by prefix instead. So can you tell us more about typical tasks you want to perform with that loop? Maybe we can find a solution that does not require you to wait till the new version of Stata.
2 likes
Leave a comment:
JanDitzen replied

03 Apr 2019, 03:28
Maybe it has been mentioned before, it would be great to have a possibility to loop over unique values of a variable (string or double) without any prior steps. At the moment my usual approach is to transfer the variable into mata or use levelsof. Both approaches have disadvantages, either require additional code or can be problematic if the variable type is not predetermined (i.e. loop over string or non-strings). What I am thinking of would be something like:

Code:

foreach lname of unique varname {
Last edited by JanDitzen; 03 Apr 2019, 03:30. Reason: changed distinct to unique
Leave a comment:
Weiwen Ng replied

27 Mar 2019, 11:24
Originally posted by Weiwen Ng View Post

Please enable the Bayesian commands to export their output in a standard format. The commands don't write to r(table), nor do they replay their estimates with the estimates store or replay commands. bayesstats summary will replay the estimation results, but it doesn't produce any output that we can capture. This appears to mean that we have to copy-paste results after running Bayesian estimation commands, which is cumbersome, error-prone, and a possible incentive for people to defect to other software packages like R.

Some discussion here.

Withdrawn. The post-estimation command bayesstats summary does write to a table called r(summary), as pointed out by Ben A. Dwamena
1 like
Leave a comment:
David Radwin replied

19 Mar 2019, 18:37
My wish: quantile (percentile) estimation with design-adjusted standard errors (that is, with the svy: prefix). This can be done using epctile (Stas Kolenikov, from findit epctile) but it doesn't always yield results.
2 likes
Leave a comment:
Weiwen Ng replied

07 Mar 2019, 13:43
Please enable the Bayesian commands to export their output in a standard format. The commands don't write to r(table), nor do they replay their estimates with the estimates store or replay commands. bayesstats summary will replay the estimation results, but it doesn't produce any output that we can capture. This appears to mean that we have to copy-paste results after running Bayesian estimation commands, which is cumbersome, error-prone, and a possible incentive for people to defect to other software packages like R.

Some discussion here.
1 like
Leave a comment:
Nicholas Winter replied

28 Feb 2019, 07:07
I would love to see two utility additions that would streamline programming a bit.

First: an addition to SMCL that acts like the {opt} directive, but which infers the minimum abbreviation from capitalization rather than the position of a colon. That is, I might do the following in a help file:

Code:

{opt key:words(string)}

Which looks like this when viewed:

keywords(string)

But it would be convenient to be able to code this as

Code:

{nickopt KEYwords(string)}

This would be convenient because the -syntax- statement in the program already has the minimum abbreviation indicated that way, so it would save some time in creating help files...

Second: a subcommand parser, modelled on -syntax-, that handles the abbreviation of subcommands behind the scenes. Right now, programs that allow abbreviated subcommands begin with code like this (taken from graph.ado):

Code:

gettoken do 0 : 0, parse(" ,") local ldo = length("`do'") if "`do'" == bsubstr("display",1,max(2,`ldo')) { // draw/display gr_draw_replay `0' exit } if "`do'" == bsubstr("save",1,max(4,`ldo')) { // save gr_save `0' exit } if "`do'" == bsubstr("use",1,max(3,`ldo')) { // use gr_use `0' exit } if "`do'" == bsubstr("print",1,max(5,`ldo')) { // print gr_print `0' exit } if "`do'" == bsubstr("dir",1,max(3,`ldo')) { // dir gr_dir `0' exit } if "`do'" == bsubstr("describe",1,max(1,`ldo')) { // describe gr_describe `0' exit }

But wouldn't it be nice (and easier to debug) to be able to use my imagined -subcommandsyntax- command, which would return the unabbreviated subcommand in the local `subcommand'

Code:

gettoken do 0 : 0, parse(" ,") subcommandsyntax 0 : DIsplay save use print dir Describe ... gr_`subcommand' `0'
Last edited by Nicholas Winter; 28 Feb 2019, 07:17.
1 like
Leave a comment:
John Mullahy replied

20 Feb 2019, 11:31
There is at least one thread on Statalist on the topic of weighted bootstrapping, e.g.

https://www.statalist.org/forums/for...quency-weights

but I would like to put something on the wishlist. Specifically, when dealing with large samples I often find it convenient to use contract and then work subsequently with the frequency weights that are generated. The "contracted" sample along with the frequency weight variable that is generated retain all the information in the original sample. I find this can be hugely time-saving when all the variables of interest are binary, categorical, or otherwise discrete.

The wish (or maybe the query) is whether there might be more efficient approaches to doing bsample than using a repeated sequence of

Code:

contract... preserve expand... bsample contract... [do calculations using this replication's frequency weights] restore

Perhaps an approach along these lines is the most efficient, but I suspect not. If not, then my wish is for developers to consider whether there are more efficient approaches and, if so, to include them as options in Stata 16. If each replication could return efficiently an already-contracted sample containing a new frequency weight that would be fabulous.
1 like
Leave a comment:
John Mullahy replied

15 Feb 2019, 15:46
Could an option be added to fracreg that would allow the range of the dependent variable to be [0,u] instead of [0,1]? The conditional mean in this case would be uF(xb), where F(.) is the cdf for probit or logit specifications.

There are situations where a dependent variable has all the features of a fractional outcome in the sense of having a finite range in which the lower and upper terminals and any values in-between may be realized in the data.

Being able to estimate such a model directly without first transforming the dependent variable to [0,1], then estimating using fracreg, and then back-transforming the coefficients so they represent the nature scale of the dependent variable would be valuable.

Last edited by John Mullahy; 15 Feb 2019, 15:57. Reason: Editing to add clarification.
1 like
Leave a comment:
Rich Goldstein replied

15 Feb 2019, 06:41
recently a couple of clients have been sending me Excel files that are password protected; so, I request that -import excel- be modified to include dealing with password protection
1 like
Leave a comment:
John Mullahy replied

15 Feb 2019, 06:38
I wonder if it is possible to add options to twoway function so that it could depict shading between two functions, like twoway rarea does for numerical values. Using

Code:

twoway function ..., recast(area)

allows shading below the pictured function down to some set base. But what I have in mind is roughly something like

Code:

twoway function y=f(x), ... area base(g(x))

where f(x) and g(x) are specific functions of x. Presumably this could be constructed such that the shading is done only when f(x)>g(x).

I realize this can be accomplished using repeated

Code:

(function..., recast(area)...)

over a user-defined range where f(x)>g(x). But being able to do this more concisely—and where the command computes the range where f(x)>g(x)—would be valuable.

(Perhaps there is an existing way to do this that I'm not aware of?)
1 like
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: