Warning about chatGPT

Stephen Weinberg

Join Date: Oct 2015

Posts: 13
#1

Warning about chatGPT

08 Sep 2025, 07:34

When I asked a student about some bizarre code in her do file, she said she had been using chatGPT to help her. (I see there's a thread about a specialized Stata GPT; this isn't that.) She was running into serious memory problems and asked chatGPT for code to reduce the amount of memory she used. It helpfully showed her how to specify the data type when generating variables--and had her always specifying variables as doubles, even when they could have been bytes.

We appear to be entering a world where we need to warn our students not just about the danger of grabbing potentially helpful code that someone has posted (that may or may not fit their needs), but also the dangers of using AI-written code. I imagine the preferred solution is the same in both cases: sure use the internet or AI to get ideas and sample code, and then don't use it until you understand it. But my n=1 experience of AI-written code suggests that it has a higher risk of flat-out doing the opposite of what you want instead of the most common flaw with googling code, which is merely adding irrelevant gunk.
Tags: None

2 likes
Nick Cox

Join Date: Mar 2014

Posts: 35807
#2

09 Sep 2025, 03:27

What to say that has not already been said in many places (and will not annoy someone mightily, as enthusiasts for AI support often accuse sceptics of negativity or worse)?

There are several dimensions here. One is what are the expectations of the students? A former colleague (deceased as well as no longer employed in the same place) had a very simple line on one project he set over several years. All software issues are part of the assignment for students to solve: I will not (he stipulated) provide support or guidance. (He didn't use Stata himself, but Stata would have been helpful for his project.) I imagine that many assignments in many places are run similarly.

My own line with students up to Master's is that I don't expect students to apply code that they haven't been shown in teaching. PhD students get more support if they need it. We are usually working together, after all.

My sympathies run in all directions. I've been using Stata almost every work day for more than 30 years but I make silly mistakes all the time. Most of the time I spot them quickly, but it is still possible for me to waste hours until the problem is solved. So, I have sympathy with anyone trying to get whatever help they can.
2 likes
Comment
Girish Venkataraman

Join Date: Dec 2021

Posts: 284
#3

09 Sep 2025, 06:50

Totally agree. It does a poor job even with seemingly simple problems I tried out of curiosity. I stopped asking it for Stata code when I quickly realized it was giving very convoluted and frequently made up non-existent methods and functions, not to mention its total ignorance of best practices. I admit it helps quite a bit with Python related methods but I am planning to stick to my manually learnt Stata code base culled mostly from this forum. I hope this forum still remains as one with only human-contributed solutions without inexperienced users jumping in provide half-baked AI-generated solutions for human posted problems.
Comment
George Ford

Join Date: Aug 2014

Posts: 3207
#4

09 Sep 2025, 14:11

Yes. Bad answers a lot, but sometimes useful. Sometimes I forget simple stuff, and it does a pretty good job with that.

I do notice that the really simple questions aren't showing up here as much, and I presume people are using AI to solve those sorts of problems.
Comment
Benno Schoenberger

Join Date: Apr 2024

Posts: 64
#5

12 Sep 2025, 06:44

Like George, I now use ChatGPT quite often, somewhat like advanced Google, because I find the results are presented in a more readable way. However, I've also encountered a few suggestions for good-sounding but nonexistent commands. However, I do also have a counterexample that surprised me and many colleagues a lot. But in a positive sense. For a certain recurring task, I needed an ado that checks whether all variables in a data set—except for variables like identifier variables or other user-specifiable ones—consist of negative values within a specific range that are coded by my organization as missing values. So, I wrote my requirements for the ado in just a few lines in ChatGPT and was excited to see the result. Copy and paste, didn't change anything, and it worked flawlessly.
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 587
#6

13 Sep 2025, 03:59

Benno Schoenberger : For readers of this Forum like me it would be interesting to see what you wanted the program to do and the code you ended up with -- more interesting than "learning" that ChatGPT not only suggests code that is nonsense but can also produce code that is useful.
1 like
Comment

Benno Schoenberger

Join Date: Apr 2024
Posts: 64

Today, 02:49

Dear Dirk Enzmann , dear others,

with the prompt
"Please write a complete, executable ado for Stata 18 that does the following: With a dataset open, the ado should check all variables for any observations where every variable has a negative value. This should be the exception to a freely definable varlist, which can be specified via an option."
I got the following:

Code:

*! version 1.0.0 14aug2025
program define checknegatives, rclass
    version 18.0
    syntax [, exclude(varlist)]

    // Step 1: Determine all variables
    ds
    local allvars `r(varlist)'

    // Step 2: Apply exclusion list
    if "`exclude'" != "" {
        local checkvars : list allvars - exclude
    }
    else {
        local checkvars `allvars'
    }

    if "`checkvars'" == "" {
        di as err "No variables left to check after exclusion list."
        exit 198
    }

    // Step 3: Check whether all relevant variables for an observation are negative
    tempvar allneg
    gen byte `allneg' = 1
    foreach v of varlist `checkvars' {
        replace `allneg' = 0 if missing(`v') | `v' >= 0
    }

    // Step 4: Count the result
    count if `allneg' == 1
    local num = r(N)

    if `num' > 0 {
        di as txt "There are " as res `num' as txt " observation(s) for which all tested variables are negative."
    }
    else {
        di as txt "No observation found where all tested variables are negative."
    }

    // Return to r()
    return scalar N_all_negative = `num'
    return local checkedvars "`checkvars'"
    return local excludedvars "`exclude'"
end

Even though I added a few more options and slightly modified the result to suit my needs, even the unmodified version was fully functional. That was quite surprising, but also impressive.

Comment

daniel klein

Join Date: Mar 2014

Posts: 3890
#8

Today, 03:24

Thanks for providing a positive example of ChatGPT creating Stata code that's (almost) fit for the purpose.

Originally posted by Benno Schoenberger View Post

[...]even the unmodified version was fully functional.

... if you have no string variables in your dataset. The prompt may or may not be read as implying that. Anyway, except for some unnecessary but minimal overhead (such as ds to get all variable names, or checking for >=0 and missing()), the code indeed seems fine.
Comment
Benno Schoenberger

Join Date: Apr 2024

Posts: 64
#9

Today, 06:13

In fact, in my current scenario, it's by definition impossible for string variables to be in the data set. However, if I expand the application area, I will modify the ado accordingly. Thank you for this valuable tip daniel klein !
Comment
George Ford

Join Date: Aug 2014

Posts: 3207
#10

Today, 07:10

I'll try to post some successes and failures. I have good success with things I'm terrible at, like date/time and regexr. But, it's also done a decent job at building Monte Carlo Simulations too.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 823
#11

Today, 10:08

It could be that AI gets to the point that it is so good its solutions are equivalent to professional level code, in which case you really just have to be able to prompt with a detailed enough specification that you get the solution you're looking for. Arguably, AI is basically there already for run-of-the-mill problems that already have explanations all over the internet, and in that context it's faster than searching and reading webpages.

Alternatively, students will not know how to specify a program with enough detail to get good output from an AI, and will sometimes run into code and advice from AI that they don't understand. That code may not work, and sometimes they will catch the mistake and try to figure it out, or someone else will catch the error and they will be embarrassed. Pointing out that it is a bad idea to copy and paste code you don't understand is probably helpful for beginners, but even when you write your own code and you think you understand it, it can be very difficult to know that the code is actually correct. My point is that whether you're using AI, Stack Overflow, Statalist, or the docs, mistakes will be made. Good students and good programmers will use some best practices to understand their code, consider edge-cases, and test to make sure it works correctly, whether the code comes from AI or not. Meet the new boss. Same as the old boss.

Last edited by Daniel Schaefer; Today, 10:11.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30189
#12

Today, 12:16

...in which case you really just have to be able to prompt with a detailed enough specification that you get the solution you're looking for...

My experience may be atypical, but I think the key aspect of success in my own career is precisely my ability to spell out a precise problem specification for people who cannot get beyond the vague on their own. In some instances it is a matter of asking a "client" the right questions and eliciting the necessary information--a skill I acquired not when learning computing but in medical school, when learning how to elicit a medical history from a patient. In other instances, my clients typically being clinician-researchers, it is, again, my medical background that enables me to "get in their head" and intuitively grasp what would most likely make sense to them, although they are not able to express it clearly.

Mind, I'm not suggesting that programmers should get medical degrees, but I think the training of people who do computing for other people (which is the bulk, though not all, of my work) would be enhanced by inclusion of a course in how to interview people to clarify their initially vague problem descriptions. And, yes, I do recognize that in some environments such as businesses producing commercial software, the problems are much more difficult than that. I'm thinking more about the consulting/collaborating process in a one-on-one or one-on-few setting.

Sorry for being tangential to the thread.
Comment

Announcement