Wishlist for Stata 16

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#196

19 Jan 2019, 05:58

There are two related annoying features of Stata that have been a pain in the neck for me on multiple occasions.

1. The degrees of freedom adjustments that Stata uses across estimators is a zoo, e.g., -regress, robust- uses N-K, estimators with cluster use G-1, etc. It would be great if across all estimators Stata Corp introduces an option that instructs the estimator not to apply any degrees of freedom adjustments.

2. Some estimators report t and F statistics, some estimators report z and Wald statistics. Some estimators have the -small- option, some dont. I dont think there is a way to instruct -regress- to report large sample statistics. It would be nice if there is unification and every estimator has the option of reporting small or large sample statistics as instructed by the user (by some option like -small- and / or -large-).

2a) It seems to me that one cannot control whether -test- reports large or small sample statistics. It seems that -test- inherits this from the estimation command. It would be nice if -test- is rewritten like this, so that the user has a choice whether to see large or small sample statistics.
2 likes
Comment
Nicholas Winter

Join Date: Mar 2014

Posts: 122
#197

25 Jan 2019, 07:46

I would love to see full Python integration, ideally along the lines of Java's integration with Stata. I think, that is, that I'm asking for a Python interpreter within Stata.

Right now I have several workflows that interact with the Qualtrics and Mechanical Turk APIs. I have Python programs that interact with the APIs. Stata calls those program with shell commands to run Python, the programs write results to CSV files, Stata then reads the CSVs and continues along. This works, but its ugly, and more importantly it requires a separate python installation on the system. That's fine for me, but makes it much more complex to share my code with others who might be new to Python.

In theory, of course, I could reprogram the API-interaction code in Stata/Mata, but that would be a ton of work that replicates routines that are already available for Python that handle the creation of HASH signatures, parse the JSON and XML return data, etc. etc. In theory I could also use Java instead of Python for all of this. But I don't know Java...if I had to start over maybe I would use Java instead of Python, but in my world (social science data analysis and programming) there is much more use of Python than Java.
2 likes
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#198

29 Jan 2019, 09:32

We can already instantly assign data types

Code:

gen double ID = _n

It would be nice if we could do the same for format types

Code:

gen %tq quarter = qofd(dofm(month))
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35664

#199

29 Jan 2019, 09:50

#198 That one like all the others is for StataCorp, but in passing please note the numdate package (SSC) with three constituent commands.

Code:

. clear

. set obs 1
number of observations (_N) was 0, now 1

. gen given = "29/1/2019"

. numdate daily ddate = given, pattern(DMY)

. convdate monthly mdate = ddate

. convdate quarterly qdate = ddate

. l

     +-----------------------------------------+
     |     given       ddate    mdate    qdate |
     |-----------------------------------------|
  1. | 29/1/2019   29jan2019   2019m1   2019q1 |
     +-----------------------------------------+

In the examples above, formats are assigned by default, but there are options to do something different.

Comment

daniel klein

Join Date: Mar 2014

Posts: 3845
#200

29 Jan 2019, 10:12

For an approach along the lines suggested in #198, see also regen (SSC).

Best
Daniel
1 like
Comment
Dario Maimone Ansaldo Patti

Join Date: Aug 2014

Posts: 505
#201

30 Jan 2019, 09:55

Hi All,

I have lost some of the posts and therefore I am not sure if this point has been already raised. Sometimes when you get an error message, you should check in which part of your do file the mistake is. Particularly, when you have several routines, this could be a little annoying. Moreover, I experimented that this is even worse if the mistake is inside a loop. You need to check each single line. Maybe it would be useful if you have an indication about the line of code where the mistake is supposed to be. I find this functionality quite useful in other softwares like latex. When I get an error message in latex, I know the type of error and the line of the code where the mistake is supposed to be.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#202

30 Jan 2019, 15:00

Hi Dario,

Perhaps you are looking for the trace option. See the manual and output for -help trace-. Turning it on will help identify the line of code that caused the error.
Comment
Alexander Rodriguez

Join Date: Jul 2017

Posts: 39
#203

30 Jan 2019, 22:05

Originally posted by Mike Murphy View Post

Not splashy, but investing time in making error messages more informative would probably have large returns for most users.

Maybe difficult to achieve but coupled with the error message could be a link to Statlist threads?

Many thanks,
Alexander
(Stata v14.2 IC for Mac)
Comment
Justin Niakamal

Join Date: Aug 2017

Posts: 760
#204

31 Jan 2019, 08:03

I too would also like to see MIDAS (Mixed-Data Sampling) implemented in Stata 16.

Now, for my less reasonable wish list I would love to see
Web scraping capability expanded in Stata. Perhaps a beautifulsoup analog.

S-values (measure of model ambiguity – sourced below).

Leamer, Edward E. 2016. “S-Values: Conventional Context-Minimal Measures of the Sturdiness of Regression Coefficients.” Journal of Econometrics 193 (1): 147–61. doi:10.1016/j.jeconom.2015.10.013.

Cinelli, Carlos. 2015. “Model Ambiguity in R: The sValues Package” https://cran.r-project.org/web/packa...es/sValues.pdf
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35664
#205

31 Jan 2019, 08:12

#204 Web scraping is clearly an interesting and important area. What's not so clear is what StataCorp are expected or requested to provide here. What I've noticed is a series of community-contributed commands that address particular sites and often are made obsolete by changes in protocol.

Do you want, therefore, Stata commands/functions? Mata functions? etc.

S-values might well be directly programmable, but the topic is new to me and I've not looked at your helpful references.

In general -- and I am clearly not speaking on behalf of the company, but I do know the kind of steer that developers give at Stata meetings -- what the company wants to know about most are features that are missing which user-programmers cannot provide easily -- meaning, that what is on offer from the community comes nowhere near what is needed.
Comment
Justin Niakamal

Join Date: Aug 2017

Posts: 760
#206

31 Jan 2019, 08:29

Hi Nick,

#204 Web scraping is clearly an interesting and important area. What's not so clear is what StataCorp are expected or requested to provide here. What I've noticed is a series of community-contributed commands that address particular sites and often are made obsolete by changes in protocol.

Do you want, therefore, Stata commands/functions? Mata functions? etc.

I would prefer Stata commands/functions (a mix of both). I referenced beautifulsoup because that module for Python is one of the more popular tools amongst programmers for web scraping and I think is a good reference for thinking about how to build the functionality/structure in Stata – be it commands or functions.

S-values might well be directly programmable, but the topic is new to me and I've not looked at your helpful references.

That would be great if someone programs the command. Unfortunately, I’m not yet at the level where I am able to do it myself!
Comment
John Mullahy

Join Date: Dec 2016

Posts: 750
#207

15 Feb 2019, 06:38

I wonder if it is possible to add options to twoway function so that it could depict shading between two functions, like twoway rarea does for numerical values. Using

Code:

twoway function ..., recast(area)

allows shading below the pictured function down to some set base. But what I have in mind is roughly something like

Code:

twoway function y=f(x), ... area base(g(x))

where f(x) and g(x) are specific functions of x. Presumably this could be constructed such that the shading is done only when f(x)>g(x).

I realize this can be accomplished using repeated

Code:

(function..., recast(area)...)

over a user-defined range where f(x)>g(x). But being able to do this more concisely—and where the command computes the range where f(x)>g(x)—would be valuable.

(Perhaps there is an existing way to do this that I'm not aware of?)
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#208

15 Feb 2019, 06:41

recently a couple of clients have been sending me Excel files that are password protected; so, I request that -import excel- be modified to include dealing with password protection
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 750
#209

15 Feb 2019, 15:46

Could an option be added to fracreg that would allow the range of the dependent variable to be [0,u] instead of [0,1]? The conditional mean in this case would be uF(xb), where F(.) is the cdf for probit or logit specifications.

There are situations where a dependent variable has all the features of a fractional outcome in the sense of having a finite range in which the lower and upper terminals and any values in-between may be realized in the data.

Being able to estimate such a model directly without first transforming the dependent variable to [0,1], then estimating using fracreg, and then back-transforming the coefficients so they represent the nature scale of the dependent variable would be valuable.

Last edited by John Mullahy; 15 Feb 2019, 15:57. Reason: Editing to add clarification.
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 750
#210

20 Feb 2019, 11:31

There is at least one thread on Statalist on the topic of weighted bootstrapping, e.g.

https://www.statalist.org/forums/for...quency-weights

but I would like to put something on the wishlist. Specifically, when dealing with large samples I often find it convenient to use contract and then work subsequently with the frequency weights that are generated. The "contracted" sample along with the frequency weight variable that is generated retain all the information in the original sample. I find this can be hugely time-saving when all the variables of interest are binary, categorical, or otherwise discrete.

The wish (or maybe the query) is whether there might be more efficient approaches to doing bsample than using a repeated sequence of

Code:

contract... preserve expand... bsample contract... [do calculations using this replication's frequency weights] restore

Perhaps an approach along these lines is the most efficient, but I suspect not. If not, then my wish is for developers to consider whether there are more efficient approaches and, if so, to include them as options in Stata 16. If each replication could return efficiently an already-contracted sample containing a new frequency weight that would be fabulous.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment