Wishlist for Stata 16

Justin Niakamal replied

31 Jan 2019, 08:29
Hi Nick,

#204 Web scraping is clearly an interesting and important area. What's not so clear is what StataCorp are expected or requested to provide here. What I've noticed is a series of community-contributed commands that address particular sites and often are made obsolete by changes in protocol.

Do you want, therefore, Stata commands/functions? Mata functions? etc.

I would prefer Stata commands/functions (a mix of both). I referenced beautifulsoup because that module for Python is one of the more popular tools amongst programmers for web scraping and I think is a good reference for thinking about how to build the functionality/structure in Stata – be it commands or functions.

S-values might well be directly programmable, but the topic is new to me and I've not looked at your helpful references.

That would be great if someone programs the command. Unfortunately, I’m not yet at the level where I am able to do it myself!
Leave a comment:
Nick Cox replied

31 Jan 2019, 08:12
#204 Web scraping is clearly an interesting and important area. What's not so clear is what StataCorp are expected or requested to provide here. What I've noticed is a series of community-contributed commands that address particular sites and often are made obsolete by changes in protocol.

Do you want, therefore, Stata commands/functions? Mata functions? etc.

S-values might well be directly programmable, but the topic is new to me and I've not looked at your helpful references.

In general -- and I am clearly not speaking on behalf of the company, but I do know the kind of steer that developers give at Stata meetings -- what the company wants to know about most are features that are missing which user-programmers cannot provide easily -- meaning, that what is on offer from the community comes nowhere near what is needed.
Leave a comment:
Justin Niakamal replied

31 Jan 2019, 08:03
I too would also like to see MIDAS (Mixed-Data Sampling) implemented in Stata 16.

Now, for my less reasonable wish list I would love to see
Web scraping capability expanded in Stata. Perhaps a beautifulsoup analog.

S-values (measure of model ambiguity – sourced below).

Leamer, Edward E. 2016. “S-Values: Conventional Context-Minimal Measures of the Sturdiness of Regression Coefficients.” Journal of Econometrics 193 (1): 147–61. doi:10.1016/j.jeconom.2015.10.013.

Cinelli, Carlos. 2015. “Model Ambiguity in R: The sValues Package” https://cran.r-project.org/web/packa...es/sValues.pdf
Leave a comment:
Alexander Rodriguez replied

30 Jan 2019, 22:05
Originally posted by Mike Murphy View Post

Not splashy, but investing time in making error messages more informative would probably have large returns for most users.

Maybe difficult to achieve but coupled with the error message could be a link to Statlist threads?
Leave a comment:
Leonardo Guizzetti replied

30 Jan 2019, 15:00
Hi Dario,

Perhaps you are looking for the trace option. See the manual and output for -help trace-. Turning it on will help identify the line of code that caused the error.
Leave a comment:
Dario Maimone Ansaldo Patti replied

30 Jan 2019, 09:55
Hi All,

I have lost some of the posts and therefore I am not sure if this point has been already raised. Sometimes when you get an error message, you should check in which part of your do file the mistake is. Particularly, when you have several routines, this could be a little annoying. Moreover, I experimented that this is even worse if the mistake is inside a loop. You need to check each single line. Maybe it would be useful if you have an indication about the line of code where the mistake is supposed to be. I find this functionality quite useful in other softwares like latex. When I get an error message in latex, I know the type of error and the line of the code where the mistake is supposed to be.
Leave a comment:
daniel klein replied

29 Jan 2019, 10:12
For an approach along the lines suggested in #198, see also regen (SSC).

Best
Daniel
1 like
Leave a comment:

Nick Cox replied

29 Jan 2019, 09:50

#198 That one like all the others is for StataCorp, but in passing please note the numdate package (SSC) with three constituent commands.

Code:

. clear

. set obs 1
number of observations (_N) was 0, now 1

. gen given = "29/1/2019"

. numdate daily ddate = given, pattern(DMY)

. convdate monthly mdate = ddate

. convdate quarterly qdate = ddate

. l

     +-----------------------------------------+
     |     given       ddate    mdate    qdate |
     |-----------------------------------------|
  1. | 29/1/2019   29jan2019   2019m1   2019q1 |
     +-----------------------------------------+

In the examples above, formats are assigned by default, but there are options to do something different.

Leave a comment:

Jesse Wursten replied

29 Jan 2019, 09:32
We can already instantly assign data types

Code:

gen double ID = _n

It would be nice if we could do the same for format types

Code:

gen %tq quarter = qofd(dofm(month))
Leave a comment:
Nicholas Winter replied

25 Jan 2019, 07:46
I would love to see full Python integration, ideally along the lines of Java's integration with Stata. I think, that is, that I'm asking for a Python interpreter within Stata.

Right now I have several workflows that interact with the Qualtrics and Mechanical Turk APIs. I have Python programs that interact with the APIs. Stata calls those program with shell commands to run Python, the programs write results to CSV files, Stata then reads the CSVs and continues along. This works, but its ugly, and more importantly it requires a separate python installation on the system. That's fine for me, but makes it much more complex to share my code with others who might be new to Python.

In theory, of course, I could reprogram the API-interaction code in Stata/Mata, but that would be a ton of work that replicates routines that are already available for Python that handle the creation of HASH signatures, parse the JSON and XML return data, etc. etc. In theory I could also use Java instead of Python for all of this. But I don't know Java...if I had to start over maybe I would use Java instead of Python, but in my world (social science data analysis and programming) there is much more use of Python than Java.
2 likes
Leave a comment:
Joro Kolev replied

19 Jan 2019, 05:58
There are two related annoying features of Stata that have been a pain in the neck for me on multiple occasions.

1. The degrees of freedom adjustments that Stata uses across estimators is a zoo, e.g., -regress, robust- uses N-K, estimators with cluster use G-1, etc. It would be great if across all estimators Stata Corp introduces an option that instructs the estimator not to apply any degrees of freedom adjustments.

2. Some estimators report t and F statistics, some estimators report z and Wald statistics. Some estimators have the -small- option, some dont. I dont think there is a way to instruct -regress- to report large sample statistics. It would be nice if there is unification and every estimator has the option of reporting small or large sample statistics as instructed by the user (by some option like -small- and / or -large-).

2a) It seems to me that one cannot control whether -test- reports large or small sample statistics. It seems that -test- inherits this from the estimation command. It would be nice if -test- is rewritten like this, so that the user has a choice whether to see large or small sample statistics.
2 likes
Leave a comment:
Gordon Fick replied

18 Jan 2019, 19:33
While it is possible to use Generalized Additive Models [GAMs] in Stata, I think one must use a rather dated interface involving DOS. If one is using a Mac or a Linux machine, I gather that one cannot use this older code. Is it not time for Stata to develop an implementation that is a part of the main system? One sees the use of GAMs quite often these days. GAMs are part of the arsenal of methods to address linearity issues.
Leave a comment:
Gordon Fick replied

18 Jan 2019, 19:27
I would like to suggest that Stata consider adding a new command to handle the 'Log Binomial' model using constrained optimization. Bernardo Andrade's lbreg in R is making the use of the Log Link with Bernoulli Regression viable now. Convergence issues are quite dramatically improved compared with the methods used in Stata's binreg or glm.
Leave a comment:
Leonardo Guizzetti replied

17 Jan 2019, 07:08
Originally posted by Marc Kaulisch View Post

I am not sure how often Microsoft is changing its data format .docx but to me it looks quite stable since a few years.

and

Originally posted by Clyde Schechter View Post

While I can certainly appreciate the value of enabling detailed control over the production of .docx and .pdf files from within Stata, this might prove to be more trouble than it is worth. While I think the Adobe .pdf file format has been pretty stable for a long time, Microsoft has a habit of introducing new "features" with some frequency. And now that they are moving to a new business model where you rent the software and they automatically upgrade you at their whim, even if you would prefer to keep what you have, StataCorp might find itself having to devote excessive resources to revising Stata every time Word gets changed. In my view, that might come at the expense of being able to improve the statistical and data management functions, or result in a steep price increase for Stata.

The Open Word XML format has been standardized, and while Microsoft does not necessary "play nice" with cross-compatability between formats, it does have a legacy of maintaining backwards compatibility for some time. For example, the ability to read and write Word 97 format .doc files are still maintained 20+ years later. Stability of the format (or a core set of format features) is important for Microsoft to be able to keep its hold over word processing.

I agree that it would not be prudent of StataCorp to try to introduce every Word feature, but something like column and row sizing is reasonably simple and can already be done by some alternative software. It's a small thing to those who don't rely on automated report generation, but it does add a nice level of polish to a final product (and can save lots of manual intervention).

Originally posted by Marc Kaulisch View Post

I am using Stata mainly for data management and reporting tasks. Recently, I was able to create quite a number of reports with Stata automatically but it costs me a lot of time to put page numbers, header and footer into the documents. These are my feature wishes for Stata 16.

Indeed this can be done, as I have recently discovered. You make use of the -putdocx append- command, which combines multiple Words files together. Take note that the order of combination matters here, so you can make complicated reports, but it may require trial and error with the order. In the toy example below, I assume that I have a Word filed called -myheaders.docx- which contains some text in the header, and some text and page numbers in the footer. The program will take that file and combine it with one that Stata generates. One report is generating by appending the headers either first or last, and you can see that only one order produces the intended result. (I think the reason for this is that styles are accumulated based on the last file appended, but I don't know for sure and haven't investigated much.)

Code:

cd "my path here" // put "myheaders.docx" file here putdocx begin , pagesize(letter) font(Arial, 11, black) putdocx paragraph putdocx text ("This is a test.") putdocx save "mybody.docx", replace putdocx append "mybody.docx" "myheaders.docx", saving("myreport1", replace) // this works as intended putdocx append "myheaders.docx" "mybody.docx", saving("myreport2", replace) // this does not work
1 like
Leave a comment:
Marc Kaulisch replied

17 Jan 2019, 06:38
Originally posted by Clyde Schechter View Post

While I can certainly appreciate the value of enabling detailed control over the production of .docx and .pdf files from within Stata, this might prove to be more trouble than it is worth. While I think the Adobe .pdf file format has been pretty stable for a long time, Microsoft has a habit of introducing new "features" with some frequency. And now that they are moving to a new business model where you rent the software and they automatically upgrade you at their whim, even if you would prefer to keep what you have, StataCorp might find itself having to devote excessive resources to revising Stata every time Word gets changed. In my view, that might come at the expense of being able to improve the statistical and data management functions, or result in a steep price increase for Stata.

In saying that, I realize that my perspective is influenced by a workflow in which I keep most of my output in the form of data sets, spreadsheets (for which -export excel- does everything I need) and .smcl files and only occasionally need to present the results in Word or PDF documents. So your mileage may well vary from mine.

I am not sure how often Microsoft is changing its data format .docx but to me it looks quite stable since a few years.

I am using Stata mainly for data management and reporting tasks. Recently, I was able to create quite a number of reports with Stata automatically but it costs me a lot of time to put page numbers, header and footer into the documents. These are my feature wishes for Stata 16.

I am not sure if these features wishes are such a burden to StataCorp because it also is a market opportunity and makes it much more valuable for users like me. Especially, it keeps me away from other solutions available outside.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: