Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Anyone running benchmarks across LLMs for Stata skill?

    Curious if anyone has already tried comparing how well different LLMs perform on Stata coding tasks—not just casually, but using formal benchmarks. I’m considering whether it’s worth building one myself, but wanted to check if something already exists in practice or in the early stages.

    I think my wishlist of things I'd like to know whether an LLM can do consistently are:
    • Reshaping and aggregating data (e.g. reshape, collapse, egen, merge), while handling the syntax correctly and by() statements logically
    • A wide range of regression tasks, including thinking clearly about standard errors and applying the syntax correctly
    • Post-estimation commands (margins, estimates, predict, lincom), including extracting and interpreting results
    • Looping or macro-driven routines (foreach, forvalues, local macros)
    • Creating formatted tables for publication (table and collect)
    • A wide range of plotting commands/techniques, including with community contributed commands and suites of commands (e.g. schemes, palettes, stata-schemepack, coefplot, etc.)
    • Writing functional, clean, reusable .do files—more than one-offs; modular coding design
    • Correct data manipulation with string functions, date handling, and factor-level processing
    • Knowledge and use of community contributed commands for newer quasi-experimental design techniques (e.g., rdrobust, csdid, ivreg2, etc.)
    But I'm sure there are a lot of ways to design a general purpose test of Stata coding ability that isn't skewed toward my applied micro needs. Anyway - just wondering if there's anything out there or if this is something people would like to have or have thoughts about how to design.

  • #2
    Have you checked this?

    https://www.statalist.org/forums/for...-dedicated-gpt

    Comment

    Working...
    X