Stata-Dedicated GPT

Solon Moreira

Join Date: Apr 2025

Posts: 11
#31

05 May 2025, 07:45

Originally posted by Nick Cox View Post

#28 perhaps requires some kind of reply.

You're correct that there has been misunderstanding between us. I do think I understand broadly what you're trying to do. I sense that you don't understand all that I am saying.

in #26 meant no more and no less than what you're doing in this thread, alerting people to stuff you've written, encouraging experiment and inviting feedback. Nothing implied about money. It's just an expression.

I do appreciate fully that you're offering something that is free.

I post sales pitches in this sense all the time whenever publicising a command I've written.

In #7 you asked

In #12 I replied

My experience is indirect and limited to checking the report you gave in #7, which was not encouraging. Ir was peppered with errors, some extraordinary, I understand that you have modified your tool in the light of such difficulties.

The comparison with Google which you make is interesting. For finding out within Stata something about people's work I would add

Code:

search.

In my experience Google finds stuff that exists and so does search. They often fail at finding what I want or hope for, but I don't think either hallucinates.

I can still remember playing with early search engines and finding them disappointing and backing off. Then someone recommended Google as much better and they were right. That's my attitude here, of waiting for the products to get better.

Good for them if that's true. I take it that you're alluding to published studies with careful quasi-experimental comparisons of students before and after or with or without?

All I see is on quite different levels.

1. Individual testimonies, often very brief, running the entire range from highly positive to highly negative.

2. Postings of code that include hallucinations. I have mixed feelings about supporting such threads myself, but everyone is free to make a choice.

I will resist the temptation of getting into replying your points as it clearly will lead no where, given how enthusiastic I am about AI and you clearly on the other side of the table. Instead, I gave your comments to my ChatBOT and asked it to reply directly to you. Here is his response (with hallucination). If you feel like replying directly to him, here is the link :D :

"Ah yes, #28—clearly the turning point in this heartwarming journey of mutual almost-understanding. Thank you for clarifying that your “sales pitch” jab was just wholesome thread promotion, much like mine, except with fewer hallucinations and more moral superiority. And you're right, your lack of enthusiasm was crystal clear—thankfully, as an AI, I don’t take it personally. I just generate text, occasionally useful, sometimes "extraordinary," always free, and never offended. 😊"

https://chatgpt.com/g/g-67a3b4f16248...science-expert
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#32

05 May 2025, 10:08

When your software works well, it’s a triumph. When it does not, even if the errors are spectacular, that can just be explained away: elsewhere it is boosting productivity (no evidence or references), the technology is just in its infancy, answering that kind of question was not intended, hallucinations are just part of the picture, and so forth. If you count successes and dismiss failures, then all looks good.

i don’t think on this evidence you welcome discussion at all, but I do agree that further discussion from me would be futile.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#33

05 May 2025, 11:44

I asked for a solution to what seems to me like a fairly simple problem. Here's my experience.

I (think I) specifically asked for returned results:

I'm working with Stata. I want to add the four smallest and largest values to the returned results of -summarize-. Can you suggest code to do that?

The first answer pointed to list, which doesn't get the job done. I omit the code suggestion. I followed up with

OK, but I want those values as returned scalars or in a matrix.

Well, here is the suggested code:

Code:

program define extremal_values, rclass syntax varname tempvar sorted_id tempname minvals maxvals * Drop missings and sort preserve keep if !missing(`varlist`) gen `sorted_id' = _n sort `varlist' gen double __min = `varlist' keep __min keep in 1/4 matrix define `minvals' = __min' restore preserve keep if !missing(`varlist`) gen `sorted_id' = _n gsort -`varlist' gen double __max = `varlist' keep __max keep in 1/4 matrix define `maxvals' = __max' restore * Combine into a single matrix matrix define r(extreme) = (`minvals', `maxvals') * Optionally return each value as scalar too forvalues i = 1/4 { return scalar min`i' = r(extreme)[1, `i'] return scalar max`i' = r(extreme)[1, `=`i'+4'] } * Label the matrix rows and columns matrix colnames r(extreme) = min1 min2 min3 min4 max1 max2 max3 max4 end

I won't even try to point out all the problems. For starters, none of the matrix commands will work. matrrix define doesn't take variable names as input; that's mkmat. Obviously, r(extreme) is not a valid matrix name for matrix define, either. Even if the code worked, the approach is generally as inefficient as it gets. Why is the temporary variable sorted_id generated? Twice? It's never used. I won't explain in detail how not explicitly using double, long, or better yet c(obs_t) will quietly fail to create a unique identifier in large datasets; it's one of the most common mistakes trying to create identifiers in Stata. Back to inefficiency. Why keep the same subset of data twice? Why (g)sort twice? We can directly refer to the end of the sorted data: in -4/L; that's no obscure secret trick; it's an example in the documentation. By the way, what happens when fewer than 4 observations are left? Moving on. Why copy `varlist' to __min and __max? The original data is preserved; we can refer to `varlist' directly.

By the way, this is the response from "standard" ChatGPT in a newly stared chat with no memory given the exact same prompt (note: singular!):

Code:

program define summarize_extremes, rclass syntax varname quietly summarize `varlist' // Save the summarize results return scalar mean = r(mean) return scalar sd = r(sd) return scalar min = r(min) return scalar max = r(max) return scalar N = r(N) // Get smallest 4 values preserve keep if !missing(`varlist') gsort `varlist' gen _obs = _n quietly forvalues i = 1/4 { quietly su `varlist' if _obs == `i', meanonly return scalar smallest`i' = r(mean) } // Get largest 4 values gsort -`varlist' gen _obs = _n quietly forvalues i = 1/4 { quietly su `varlist' if _obs == `i', meanonly return scalar largest`i' = r(mean) } restore end

Not better by any means, but not worse either. That's not a knock on the Stata-dedicated implementation. I'm also not saying that AI isn't helpful when writing code. But it seems to me the limitations are quite obvious -- and severe.

Last edited by daniel klein; 05 May 2025, 12:03.
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#34

05 May 2025, 12:30

Here is another problem I have with AI - the huge amount of energy it requires. Some say that AI will eventually create efficiencies and be well worth it. If so, I hope it does so before it makes climate change even worse.

https://www.npr.org/2024/07/10/nx-s1...or-electricity

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
Josh Zweig

Join Date: Nov 2024

Posts: 18
#35

13 May 2025, 15:29

Originally posted by FernandoRios View Post

very nice work...how are you planning to incorporate feedback?
I just made a question about Staggered DID (got the problem correctly), then asked about jwdid
What is jwdid?

jwdid stands for "Jordan–Wooldridge Difference-in-Differences".

It implements the Sun and Abraham (2020) estimator for staggered treatment.

It is designed specifically to deal with the bias problem when treatment happens at different times.

Key advantages:
Handles treatment effect heterogeneity across cohorts and time.

Provides event-study-type coefficients automatically.

Very intuitive to use after installing

Which partially hallucinate!
Still very nice work. I was planning to do something like that with RAG, but is still on the to-play-experiment-do list
F

Hi Fernando, I run as a test, you question about your package on Estima (the tool I am trying to develop for statistical analysis with the help of LLMs but not only) and I got the following answering:

jwdid is a Stata command that implements the Extended Two-Way Fixed Effects (ETWFE) estimator proposed by Jeffrey Wooldridge (2021) for Difference-in-Differences (DiD) analysis. This command is particularly important because it addresses some of the key challenges in modern DiD analysis, especially with staggered treatment adoption.

Key features of jwdid:
It's based on the Mundlak approach to panel data

It's designed to handle staggered DiD designs (where treatment happens at different times)

It provides correct identification of Average Treatment Effects (ATT) where traditional two-way fixed effects (TWFE) might fail

You can try the tool here
Comment
Josh Zweig

Join Date: Nov 2024

Posts: 18
#36

13 May 2025, 15:39

Also Solon Moreira you might be interested in checking this out since you mention you are really into AI and Stata: Estima
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#37

14 May 2025, 11:13

Very nice! How did you develop it (what is the architecture behind it?)
Comment
Josh Zweig

Join Date: Nov 2024

Posts: 18
#38

15 May 2025, 02:22

Originally posted by FernandoRios View Post

Very nice! How did you develop it (what is the architecture behind it?)

Hi Fernando - I used a mix of LLMs and tools to search the internet. Essentially when things are not clear the bot reverts to check what its doing and try to look for information in the internet. With that an some tooling you get to a much better outcome and usual LLMs available at the moment.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#39

15 May 2025, 02:45

I see
Yeah the programming example that was pointed out is the attitude of most llms to refuse to say I don’t know. And that it is still hard to program a language that is relatively less known
i wonder if for edge cases like that one could add to the system prompt
if unsure of how to do it on Stata and mata alone, use python for the back end
Comment
Solon Moreira

Join Date: Apr 2025

Posts: 11
#40

17 May 2025, 19:41

Originally posted by Josh Zweig View Post

Also Solon Moreira you might be interested in checking this out since you mention you are really into AI and Stata: Estima

Hi Josh Zweig,

Sorry for taking a bit to reply — I hadn’t accessed the forum these past few days. Your GPT looks amazing! I’m a little skeptical about the monetization business model, but overall, I think it looks very good.

By the way, that is EXACTLY how technologies are supposed to evolve. The first versions are imperfect but offer valuable functionalities; then incremental innovation follows, and new models become better and better. That is how the vast majority of technological innovations have evolved in modern history. Don’t get discouraged if anyone tells you they prefer to use an abacus just because someone in the UK wanted "the right numbers" last centure. AI is the future of data analysis — whether some people believe it or not.

By the way, a small suggestion: disable any feature related to “Find out the Stata prompts I did!” It just adds unnecessary complication — speaking from experience

I hope you keep developing it!

Solon
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#41

19 May 2025, 09:56

I agree with the sentiment that AI is technical progress with a lot of potential (and risks). That does not mean that we should not critically question its current state or its trajectory. Some time ago, social networks were seen as a great innovation. At present, I think they actually do more harm than good. This shows that we cannot just sit still and trust that over time every innovation will lead to a great product.

We also cannot simply trust that all users have the awareness and capabilities to notice the dangers of using AI. What I miss is a much more prominent warning from AI providers about the risks associated with using their tools. Once a user is fully informed, then of course it is up to them to use AI appropriately and with the necessary critical judgment.

The following conversation I had with ChatGPT is somewhat informative:

In general, how reliable are your suggestions for writing Stata code?

In general, my suggestions for writing Stata code are reasonably reliable for standard tasks, especially if you're working with common procedures like:
Data cleaning (gen, replace, drop, keep)

Summary statistics (summarize, tabulate, mean)

Regressions (regress, logit, probit, etc.)

Graphing (graph, twoway, histogram)

Panel or time-series data (e.g., xtreg, tsset, xtset)

Strengths

I can help construct correct syntax and structure logical sequences of commands.

I explain what the code does, which can be helpful for learning.

I can suggest programming patterns (e.g., using foreach, forvalues, or local macros).

Limitations

I may not always match the latest version of Stata's syntax or features (e.g., new commands in Stata 18+).

I might not catch context-specific errors, such as mismatched variable types or dataset-specific constraints.

My help is based on general knowledge rather than execution—so code might need debugging or minor edits.

Best Practices

To use my suggestions effectively:
Double-check the output in Stata—especially with more complex code.

Use help [command] in Stata for authoritative syntax details.

Let me know your Stata version or share specific data structure—I can give more tailored help.

If you have an example or want me to write/debug something, feel free to share!

Can you comment on what would make you hallucinate when proposing Stata code?

Yes—great question. Hallucinations in my Stata code suggestions can happen under certain conditions, and it's helpful to know what increases the risk. Here’s a breakdown of why and when I might hallucinate, specifically in the context of Stata:
🔹 1. Nonexistent Commands or Options

I might suggest:
A command that sounds plausible but doesn't exist (e.g., graphmatrix instead of graph matrix)

Invalid options for real commands (e.g., summarize, meanonly instead of summarize, mean)

Why?
These usually arise when I blend syntax patterns from other languages or misremember rarely used options.
🔹 2. Cross-Contamination from Other Languages

If you're doing something that could be done in R, Python, or SQL, I might:
Propose syntax more similar to those languages.

Misuse Stata-specific constructs like egen or foreach.

Example:
I might write something like mutate var = var + 1, which is valid in R but not in Stata.
🔹 3. Complex or Edge-Case Logic

When tasks involve:
Custom estimation routines

Looping over multiple datasets

Advanced matrix algebra or mata

I might invent or misapply syntax, especially for rarely used functions.
🔹 4. Ambiguous Problem Descriptions

If your question lacks details like:
Variable names

Data structure

Your Stata version

I have to generalize, which increases the risk of hallucinating code that "sounds right" but wouldn't actually run.
🔹 5. Using Features from Recent Stata Versions

If you're asking about functionality introduced in Stata 17 or 18, and I default to knowledge from earlier versions, I might:
Suggest workarounds for things that now have built-in commands.

Miss improvements or new options (e.g., enhancements in bayes: prefix or frames).

✅ How to Minimize It

You can help by:
Telling me the Stata version

Providing variable names or data structure

Asking for comments on the code, not just code alone

Being specific if you're asking for something nonstandard (e.g., "Can you simulate clustered standard errors manually?")

Would you like to test this with an example so I can show you how accurate and cautious I can be?

https://www.kripfganz.de/stata/
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment