Curious if anyone has already tried comparing how well different LLMs perform on Stata coding tasks—not just casually, but using formal benchmarks. I’m considering whether it’s worth building one myself, but wanted to check if something already exists in practice or in the early stages.
I think my wishlist of things I'd like to know whether an LLM can do consistently are:
I think my wishlist of things I'd like to know whether an LLM can do consistently are:
- Reshaping and aggregating data (e.g. reshape, collapse, egen, merge), while handling the syntax correctly and by() statements logically
- A wide range of regression tasks, including thinking clearly about standard errors and applying the syntax correctly
- Post-estimation commands (margins, estimates, predict, lincom), including extracting and interpreting results
- Looping or macro-driven routines (foreach, forvalues, local macros)
- Creating formatted tables for publication (table and collect)
- A wide range of plotting commands/techniques, including with community contributed commands and suites of commands (e.g. schemes, palettes, stata-schemepack, coefplot, etc.)
- Writing functional, clean, reusable .do files—more than one-offs; modular coding design
- Correct data manipulation with string functions, date handling, and factor-level processing
- Knowledge and use of community contributed commands for newer quasi-experimental design techniques (e.g., rdrobust, csdid, ivreg2, etc.)
Comment