Wish List for Stata 20

Yuduan Puu

Join Date: Mar 2026

Posts: 6
#121

20 Mar 2026, 01:32

Being able to integrate better with LLMs. I don't think students want to study STATA anymore, if it claude + python is easier to use than claude + STATA
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36074
#122

26 Mar 2026, 10:17

#121 overlaps with #118 to which #119 is a very personal answer.

I don't see StataCorp as aiming Stata at anyone who wants to run Python code and needs AI assistance to write it. Is that a foundation for good quality original research leading to advanced degrees or publications in reputable journals?

Or -- on another level -- what precisely would "integrate better" mean for Stata developers?
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 842
#123

26 Mar 2026, 12:33

Or -- on another level -- what precisely would "integrate better" mean for Stata developers?

Just playing devil's advocate, I can see two ways Stata could be better integrated with an LLM. First, LLMs could be better at Stata. That is ultimately limited by the availability of training data, and there is simply more training data for python, but could be augmented with some fine-tuning/transfer-learning techniques. One could also build an interface between an LLM and the interpreter, allowing the interpreter to check the correctness of the syntax before the predicted code is served to the user. It's a bit like handling the fact that LLMs are bad at arithmetic by giving one access to a calculator behind the scenes. This points to a subtle hidden-cost of relying on LLMs: The LLM will never be up-to-date on the latest features and technologies because the body of code to train them on doesn't exist yet. One might think a combination of emergence and searching for and providing the LLM the relevant documentation as part of the prompt might solve that problem. Setting aside the magical thinking implicit in "emergence," all of that works fairly well for simple problems or syntax questions, but it doesn't scale well to more complex and synthetic code where the LLM will likely default to older solutions. The problem is even worse when you are working in a proprietary codebase, as many professionals do. It seems like modern AI companies are getting around these issues for now by throwing resources at the problem, by training larger and larger models, and by focusing on building for languages where a ton of training data already exists, like python. The problem is that training these models is extremely expensive and training entirely new models as an update strategy doesn't scale well given the costs involved. I'm not saying issues of scale will prevent these companies from continually updating there models, but it will place some practice limits on how they do it, which may mean that LLMs continue to be fairly good for basic python, but less good for Julia or the latest version of Stata. If that prevents people from learning Stata, it is to their own detriment, because I would take Stata's library of statistical models over python's statesmodels package for essentially every regression related task. Stata is simply much better with a much more extensive library of models than Python for those kinds of modeling tasks.

In addition to improving the LLM output, you could literally integrate an LLM into the do file editor. Let the LLM try to predict the next token as you type and hit tab when the LLM has made the correct guess to fill in the next piece of code. Then create a way to prompt the LLM from the do file editor if you have a question about the code or if you'd like it to write some template code, with the response inserted directly into the do file editor as a comment or as code. The first feature especially already exists in a primitive form. The do file editor will try to guess the token you are typing in based on the tokens that already exist in the do file. Next-token-prediction from an LLM would be a much more sophisticated version of that. Of course, you would essentially need to repeatedly and automatically query a huge data structure at a massive datacenter. That'll likely cost you (or someone) at least a few cents per token plus some per-query overhead, and you'd be contributing to all the negative externalities from LLMs, but I suppose that is the cost of outsourcing your intelligence to a tech company.

As I've said in the past, I don't support the introduction of these features right now because I think we'd be better served by simpler extensions to the do file editor that have existed in other software for decades. An obvious start would be more informative syntax errors. Stata often returns "syntax error" at runtime, but I'd love to see the closest possible approximate location of the syntax error highlighted in the text of the do file and an informative error message presented in a tool tip after a second or so on mouseover. Such a feature would also be useful to augment an LLM when the time comes. Moreover, students may not want to learn Stata, but they should still do so for a number of reasons. The biggest issue is that LLMs don't scale well to complex tasks. Anyone who is literate can prompt an LLM to get code it's seen a million times before in its training set, but once you get sophisticated enough that stops working, and you'll need real expertise to move forward.
1 like
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1171
#124

31 Mar 2026, 08:55

I wish graph matrix had options to include regression lines and confidence intervals for rho. Here is an example of what I have in mind (created via JASP).

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Crypticity belongs in crosswords, not code! 🤨
3 likes
Comment
Dave Airey

Join Date: Apr 2014

Posts: 416
#125

31 Mar 2026, 11:44

Moreover, students may not want to learn Stata, but they should still do so for a number of reasons.

Daniel Schaefer I hope some of what you say is true, but my industry is very much embracing AI and tools like Claude Code, which for programming does work well with Python and R for many tasks. Is the purpose to be at the bleeding edge of coding complexity using Claude Code? In many cases it is just to accelerate productivity with an excellent coding assistant who also has excellent general knowledge. Do we know how best to use Claude Code? No, but the push is to learn immediately. I do think it a possibility that LLMs like Claude and Claude Code will lead to more users of R and Python and less with Stata and SAS even if both Stata and SAS have excellent and deserved market positions in several fields and are well liked by patrons.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 842
#126

31 Mar 2026, 15:51

Dave Airey Don't misunderstand, I very much like Claude Code for many tasks, and I think agentic AI looks very plausible for large-scale rapid-prototyping tasks. It's definitely undeniable that industry is embracing these tools, though I think it is still sometimes hard to separate the genuine use cases from the hype. I've also used these tools enough to see them make mistakes, and when they do the output is often confident and plausible sounding. I guarantee right now people in industry are using LLMs to produce dashboards with plausible looking graphics and a clear story that are nonetheless straightforwardly wrong, and no one is bothering to check. My point isn't that LLMs aren't useful, it is that students need to develop enough background knowledge and expertise to use an LLM effectively and to avoid being fooled by incorrect but compelling output.
2 likes
Comment
Dave Airey

Join Date: Apr 2014

Posts: 416
#127

01 Apr 2026, 08:57

Daniel Schaefer Well said. Validation (responsible use) of all AI artifacts and assistance should be required for sure.
2 likes
Comment
Matej Seifert

Join Date: Apr 2025

Posts: 5
#128

03 Apr 2026, 04:00

When using navigator in the do-file editor, I would appreciate highlighting of current "chapter" in navigator window. User could quickly see in which part of larger do-file is he at the moment.
1 like
Comment
Erik Reinbergs

Join Date: Oct 2022

Posts: 42
#129

28 Apr 2026, 07:52

Two quality of life improvements for etable:

1) A 'wide' option that puts stats next to each other (in separate columns) rather than below each other in one column
2) Option to remove leading zeros from p values
2 likes
Comment
Jean-Michel Galarneau

Join Date: Aug 2018

Posts: 40
#130

06 May 2026, 16:52

mi estimate can fully return the theta and related stats when using shared() with stcox
Comment
Radion Svynarenko

Join Date: Aug 2025

Posts: 4
#131

27 May 2026, 07:59

If you try to print the Help file, the text often does not fit properly on the page. The only way to fix this is to resize the Help window. Please fix this issue in Stata 20. Pretty much in any other existing software, if you hit a "print" button, it will fit the content to the page.
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 617
#132

27 May 2026, 13:55

Radion Svynarenko (#131): My take on this is: What fits "properly" on a page for you personally may be different for others. I think it would be problematic to set a specific format for everyone—besides, there are different paper sizes (Legal, Letter, A4, etc.). The easiest way to adjust this is actually to resize the viewer window. Also note that help files for user-written programs use SMCL encoding differently and to varying degrees of quality, so a "proper" fit may look different depending on the help file.

For me personally, the help files in the viewer are perfectly sufficient for a quick overview—and the help from the manuals (PDF) can be printed (at least for me) in a neat format without any problems.
3 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36074
#133

28 May 2026, 02:59

https://www.statalist.org/forums/for...-word-retained gives further (or rather previous) discussion around the issue raised by Radion Svynarenko in #131. I tend to agree with Dirk Enzmann -- but the question is as usual aimed primarily at StataCorp developers.
Comment
Fahad Mirza

Join Date: Sep 2018

Posts: 263
#134

01 Jun 2026, 13:13

Will be nice to add an option to have transparent background for graphs. Makes it easy then to use graphs in presentations where one can paste visual on top of a custom background
5 likes
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 617
#135

16 Jun 2026, 10:00

It would be helpful if in the Stata documentation for -set seed- and -mata: rseed()- a note would be added that there may be situations where additionally -set sortseed- may be needed to guarantee reproducible results. See also Setting random seed not enough?

Last edited by Dirk Enzmann; 16 Jun 2026, 10:05.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment