Stata-Dedicated GPT

Solon Moreira

Join Date: Apr 2025

Posts: 11
#1

Stata-Dedicated GPT

28 Apr 2025, 07:30

I recently created a specialized Stata GPT designed specifically for the Stata community. It’s not a generic chatbot — it generates and reviews Stata code, analyzes data, and helps answer any Stata-related questions you throw at it, from basic regressions to more advanced models. The tool is designed to handle everything from foundational commands to advanced econometric techniques, aiming to provide accurate, Stata-specific support rather than generic advice:

https://chatgpt.com/g/g-67a3b4f16248...science-expert

If you’re curious or want to give it a try, feel free to check it out — feedback is very welcome!
Tags: None

3 likes
FernandoRios

Join Date: Apr 2014

Posts: 2469
#2

28 Apr 2025, 08:05

very nice work...how are you planning to incorporate feedback?
I just made a question about Staggered DID (got the problem correctly), then asked about jwdid
What is jwdid?

jwdid stands for "Jordan–Wooldridge Difference-in-Differences".

It implements the Sun and Abraham (2020) estimator for staggered treatment.

It is designed specifically to deal with the bias problem when treatment happens at different times.

Key advantages:
Handles treatment effect heterogeneity across cohorts and time.

Provides event-study-type coefficients automatically.

Very intuitive to use after installing

Which partially hallucinate!
Still very nice work. I was planning to do something like that with RAG, but is still on the to-play-experiment-do list
F
Comment
Solon Moreira

Join Date: Apr 2025

Posts: 11
#3

28 Apr 2025, 11:32

Hi Fernando,

Thank you very much for the feedback! That’s exactly the goal — to improve it based on the experience you reported. I’ve just made adjustments to address the hallucination you described, and I believe it’s now more precise. If you have a chance to give it another try, I’d love to hear what you think.

Best,
S
Comment
Mukesh Punia

Join Date: May 2020

Posts: 101
#4

28 Apr 2025, 20:02

After getting initial correct response on what is jwdid I asked why is called as jwdid? Response is - The name jwdid comes from the initials of its author: Jann Wilbert — JW — combined with DiD for Difference-in-Differences.

Best regards,
Mukesh
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

29 Apr 2025, 03:17

A friend who might prefer to remain nameless asked about me as a Stata user. The report was mostly factual but included attribution to me of marginsplot, with which I have had nothing to do.
1 like
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#6

29 Apr 2025, 05:42

I think its a good tool. Not sure how are you fine-tuning the model. But, as many things with LLM, i think it will just need some time to keep the training.
I would be curious to see how is it working on the background.
Note: I built a very simple interface with LLM as well that does a simple RAG using search on the web (tavily) which works well. Perhaps that could help improve the outcome. Also a very strong prompt saying, if you are not sure...do not make things up.
Still, nice tool!
Comment
Solon Moreira

Join Date: Apr 2025

Posts: 11
#7

29 Apr 2025, 07:28

Originally posted by Nick Cox View Post

A friend who might prefer to remain nameless asked about me as a Stata user. The report was mostly factual but included attribution to me of marginsplot, with which I have had nothing to do.

Hi Nick,

Thank you! I hadn’t thought about using it that way, but it definitely works — really appreciate the tip. That said, I think your friend isn’t 100% accurate on one point 😊.

When I asked about you, I got the following general answer. Then I followed up asking for details about your contributions to Stata, and here’s what came up.

Could you take a look particularly at point 6 — marginsplot? I’m curious if that’s completely off.

XXXXXXXXXX

Nick Cox is a well-known figure in the Stata community, particularly recognized for his contributions to Stata programming, data visualization, and statistical methods. He is an Honorary Fellow in the Department of Geography at Durham University in the UK.

Here are some key points about Nick Cox:
Stata Journal Contributor: He has written extensively for The Stata Journal, contributing tutorials, discussions on statistical graphics, and technical notes.

Author of Stata Commands: He has developed many user-written Stata commands, especially in areas like graph customization, descriptive statistics, and data manipulation.

Statalist Moderator: He was a long-time active participant and moderator on Statalist, the primary user forum for Stata users worldwide.

Focus on Clarity and Teaching: His writing is valued for clarity and accessibility, making advanced Stata functionality understandable to a wide range of users.

Nick Cox is not affiliated with StataCorp but has played a crucial role in the user community’s growth and in expanding Stata’s applied capabilities through his shared expertise.

Would you like to see some of the commands or techniques he’s contributed to?

Great! Here are several notable Stata commands and packages developed or co-developed by Nick Cox, widely used in applied research and data analysis:
1. tabplot

Creates table plots — a cross between tabulations and bar charts, great for categorical data visualization.
stata
CopyEdit
ssc install tabplot tabplot var1 var2, percent(bar) stack
2. qplot

Produces quantile plots — useful for comparing distributions or checking for normality.
stata
CopyEdit
ssc install qplot qplot income
3. stripplot

Visualizes distributions of numeric variables using strip charts (good for small sample sizes).
stata
CopyEdit
ssc install stripplot stripplot age, over(gender)
4. tab_chi

A formatted version of tabulate, displaying expected frequencies and residuals for chi-squared tests.
stata
CopyEdit
ssc install tab_chi tab_chi group outcome
5. cprplot (co-authored)

Plots component-plus-residuals for visualizing linearity in regression models.
stata
CopyEdit
ssc install cprplot regress y x1 x2 cprplot x1
6. marginsplot Enhancements (with context)

While not the original author of marginsplot, Cox has written about enhancing visualizations using margins, predict, and graphical layering techniques.
7. lvpplot

Added-leverage and residual plots — useful for regression diagnostics.
stata
CopyEdit
ssc install lvpplot regress y x1 x2 lvpplot x1

These are just a few. His packages often focus on data exploration, visualization, and interpretable summaries — particularly helpful for teaching and diagnostics.

Would you like a demo with sample data using one of these commands?
Comment
Solon Moreira

Join Date: Apr 2025

Posts: 11
#8

29 Apr 2025, 07:35

Originally posted by FernandoRios View Post

I think its a good tool. Not sure how are you fine-tuning the model. But, as many things with LLM, i think it will just need some time to keep the training.
I would be curious to see how is it working on the background.
Note: I built a very simple interface with LLM as well that does a simple RAG using search on the web (tavily) which works well. Perhaps that could help improve the outcome. Also a very strong prompt saying, if you are not sure...do not make things up.
Still, nice tool!

The first version of this chatbot was hosted on CustomGPT and operated as a pure RAG model, drawing exclusively from a carefully curated knowledge base. It required a substantial document repository and, while relatively slow, it delivered very low hallucination rates — almost zero, in fact. Given its popularity, I migrated the chatbot to OpenAI’s platform, which offers a much faster response time and a more user-friendly interface. I’ve been refining it through a combination of prompt engineering, instruction tuning, and ongoing adjustments to the knowledge base. This is actually the first time I’ve received feedback regarding hallucinations about who created specific Stata codes or estimators. I suspect this is because the users in this forum are significantly more technically knowledgeable than the general audience, so the questions are naturally more detailed and precise.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#9

29 Apr 2025, 07:51

I asked it a few questions about gologit2. It got a lot of right but had a few major mistakes. Most seriously, its claim of how to interpret the sign of effects was the exact opposite of what it should be. AI can be very helpful with many things, but for anything important I always verify.

i once asked AI a question about the General Social Survey. The answer seemed very plausible but I couldn’t verify. I wrote GSS and they confirmed that the AI answer was way off.

i would say AI can be good at getting you started, but you shouldn’t rely on it too heavily.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#10

29 Apr 2025, 08:00

My biggest AI horror story: I have a reading entitled “America’s Toe Tapping Menace.” The student said it was about square dancing. It is actually about how gay men attracted possible partners. (The author was very critical about police procedures that entrapped harmless gay men.) If the student had read the first paragraph of the short article they would have known the answer was absurd, but they obviously didn’t.

There could be a lot of other abuses involving AI that are not so obvious. We need to use AI as a supplement for reading and research, not a replacement for it.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#11

29 Apr 2025, 10:06

AI is a good Tool, but lucky for us, still lacks common sense.
For somethings its so good is scary, but asking supper small details (like jwdid because i wrote it), it is not very unusual it would fail to answer.
the same with your reading! It probably its not common, but just based on the title, makes sense to think its about dancing!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#12

29 Apr 2025, 10:07

Sorry, but that reply about me in #7 is more error-prone than what my friend received. In broad brush it's fair, even flattering, but the details vary between factual and fabricated.

I am not an Honorary Fellow; indeed I don't think that title applies to anyone else at my university either.

I have never been Moderator on Statalist. That title belongs only to Marcello Pagano. It's historic in practice (since 2014 as StataCorp personnel have taken over).

tabplot is on SSC, but that's out of date. The command is published through the Stata Journal.

qplot is not on SSC, and never has been. The command is published through the Stata Journal.

cprplot is an official command. I can claim no credit. It is not on SSC, and never has been.

I haven't written at length about margins or marginsplot that I can recall. If I've ever mentioned it, it is not a point that I would ever include in a summary. (The reason is not relevant here, but it's that other people have much more experience with those commands.)

lvpplot I don't recognise as a Stata command that I wrote (or that anyone else did, so far as I can find quickly). It is definitely not on SSC either as a package or as an individual command.

So, what to say? I can't add to other takes that this technology is very impressive when it gets things right and very puzzling or worse when it hallucinates. I am not sure where to go when it's evident that the people who need (or want) it most are going to be those least likely to spot its errors, except the hard way.

Thanks for your offer of more materials, but sorry, no enthusiasm.
2 likes
Comment
Solon Moreira

Join Date: Apr 2025

Posts: 11
#13

29 Apr 2025, 10:26

Originally posted by Nick Cox View Post

Sorry, but that reply about me in #7 is more error-prone than what my friend received. In broad brush it's fair, even flattering, but the details vary between factual and fabricated.

I am not an Honorary Fellow; indeed I don't think that title applies to anyone else at my university either.

I have never been Moderator on Statalist. That title belongs only to Marcello Pagano. It's historic in practice (since 2014 as StataCorp personnel have taken over).

tabplot is on SSC, but that's out of date. The command is published through the Stata Journal.

qplot is not on SSC, and never has been. The command is published through the Stata Journal.

cprplot is an official command. I can claim no credit. It is not on SSC, and never has been.

I haven't written at length about margins or marginsplot that I can recall. If I've ever mentioned it, it is not a point that I would ever include in a summary. (The reason is not relevant here, but it's that other people have much more experience with those commands.)

lvpplot I don't recognise as a Stata command that I wrote (or that anyone else did, so far as I can find quickly). It is definitely not on SSC either as a package or as an individual command.

So, what to say? I can't add to other takes that this technology is very impressive when it gets things right and very puzzling or worse when it hallucinates. I am not sure where to go when it's evident that the people who need (or want) it most are going to be those least likely to spot its errors, except the hard way.

Thanks for your offer of more materials, but sorry, no enthusiasm.

Hi Nick,

Sorry for the confusion — I realize I wasn't specific enough earlier. Just to clarify, I’m not offering you anything! I was actually focused on a particular point. In your initial post, you specifically mentioned marginsplot (not the other commands listed by the GPT). Interestingly, it was also the only one flagged with "(with context)". I was simply trying to understand whether that was caused by some pattern or just a coincidence.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#14

29 Apr 2025, 10:42

I wonder if you should try to add an additional system prompt where the AI focus only on commands, programming, and information regarding the authors or sources.
Whenever one moves into details of people, it will very likely hallucinate. Thus, the more you have him "refuse" give those details the better.
PS. I would like to see more of these and how it works. Best wishes
Comment
Solon Moreira

Join Date: Apr 2025

Posts: 11
#15

29 Apr 2025, 11:00

Originally posted by FernandoRios View Post

I wonder if you should try to add an additional system prompt where the AI focus only on commands, programming, and information regarding the authors or sources.
Whenever one moves into details of people, it will very likely hallucinate. Thus, the more you have him "refuse" give those details the better.
PS. I would like to see more of these and how it works. Best wishes

When I first created the GPT, I didn’t anticipate that users would ask about the individuals behind specific commands. But of course, in a community like this—where many contributors are also developers—it makes total sense. I debated whether to improve the prompt to better handle those questions or to disable that functionality altogether. For now, I’ve decided to go with the safer route and disable references to individual developers. I think tasks like that would likely be better handled by the general GPT.

If you get a chance to test it again, I’d really appreciate any feedback. Thanks again for the helpful suggestion!
Comment

Announcement

Stata-Dedicated GPT

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment