Stata and AI

Nick Cox

Join Date: Mar 2014

Posts: 35667
#1

Stata and AI

23 Feb 2023, 01:17

It would be good all round if AI tools were able to solve any Stata coding problems too. I haven't tried any myself, but I stumbled across this post.

https://twitter.com/peternka/status/1598409508900196357

No one seems to have noticed that -- apart from some confusion about what is or is not a command, a function and an option -- the recommended code

Code:

summarize var1, mean sd

is illegal. If you remove the offending option sd the results don't include the SD, and you need to consult the help to find that meanonly is a programmer's option with r-class results but nothing displayed in the Results window.

More examples, better, as good, or worse, surely welcome.
Tags: None

1 like
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

23 Feb 2023, 05:51

Yeah i agree. This is why I'm never really sympathetic to "the robots are gonna take over (our jobs, more specifically)". GPT and other AI sources have the potential to be fantastic outlets that can help people do all manner of things, and it's a phenomenon that excites me...... but! It's no substitute for human coders who are experienced. As you say all the time, commands, functions, and options aren't the same, it out may not be reasonable to expect GPT to know this.

This is a small error, but imagine many small errors like this at scale (not just for Stata, any other language). You'd essentially have code being corrupted by lots of small errors, even though overall its general thrust is correct. either way though, having properly trained ML algorithms to help for stuff like this is awesome, and at some point I'll likely use it for myself (namely for Python!!!)
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30083
#3

23 Feb 2023, 09:21

I think there are different levels to this. Details like knowing the syntax of the various Stata commands and exactly what they do are, I think, well within the grasp of artificial intelligence. In fact, they are well within the grasp of lower levels of technology, such as Stata itself. If you look at modern C++ compilers, they are able to identify and alert you to all sorts of problems with your code that are well beyond just syntactic errors (e.g. memory leaks, accessing uninitialized data, races).

But there is another level I am skeptical about AI ever being able to reach. My experience here on Statalist is that a large proportion of the questions raised here arise because the user has not sufficiently specified the question. A program, in any language, transforms inputs (usually subject to some validity constraints) into outputs that relate to those inputs in some specific way. But if you can't state precisely what the relationship of the outputs to the inputs is supposed to be, you cannot write code. In most of these cases, at some intuitive level the user knows what is wanted, but has difficulty setting it out completely and in precise language. These threads are usually characterized by a series of back and forth posts that elicit the problem specification clearly enough to enable code writing. But that back and forth is often, in part, guided by the responder's knowledge of the kinds of things that are typically wanted, so that concrete questions like "do you mean this or do you mean that" can be posed. Although I don't want to be on record as claiming that, in principle, an AI cannot be developed that could take over this kind of interrogation function, current technology is far from being able to do that and I don't think it will get there any time soon. Suffice it to say, such an AI would need an in depth knowledge of programming, a working familiarity with the subject matter of the application to be coded, and excellent skills at eliciting clarification from users who have difficulty communicating their intent.

It might happen eventually, but I don't expect it any time soon.

Last edited by Clyde Schechter; 23 Feb 2023, 09:25.
2 likes
Comment
Tim Huegerich

Join Date: May 2016

Posts: 19
#4

23 Feb 2023, 10:43

I suspect that near-term AI tools will be much less capable with Stata than with Python and other such languages with copious amounts of high-quality, open-source code available for crawling. If that's right, this reality may prove to be a significant obstacle to growing/maintaining Stata's popularity.
2 likes
Comment
Tiago Pereira

Join Date: Jan 2016

Posts: 383
#5

23 Feb 2023, 13:38

I have used ChatGPT to catch errors in my codes. It's not useful for generating new code from scratch, but it helps a lot in catching errors.
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3449
#6

24 Feb 2023, 01:36

Originally posted by Tim Huegerich View Post

I suspect that near-term AI tools will be much less capable with Stata than with Python and other such languages with copious amounts of high-quality, open-source code available for crawling. If that's right, this reality may prove to be a significant obstacle to growing/maintaining Stata's popularity.

There is SSC to crawl for Stata. The challenge with R is that there is a lot of code available, but the quality is a real mixed bag: there is absolutely exceptional stuff there, but also complete garbage, and everything in between. For any automated system you have to keep GIGO (Garbage In --> Garbage Out) in mind.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment

Comment