Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Introducing Estima – Your AI Companion for Statistical Work

    Hi everyone,

    After six months of tweaking, testing, and trying to get it right, I’m happy to share something I’ve been working on: Estima, an AI-powered tool designed to support users of statistical software like Stata.

    Estima is not just about writing Stata scripts or translating them from Python or R. It’s about helping you think through your analysis, your models, and your assumptions. It acts like a companion during your work: whether you're debugging, planning a causal inference analysis, or deciding how to approach a problem, Estima is there to support your thinking, not just automate your typing.

    It’s free to try, I have created a free plan of 15 uses per month, and I’d love your feedback as I keep improving it. This is still an MVP, built by someone with experience using Stata, and I know there’s plenty of room to grow.

    Eventually, a native package within Stata might make more sense. But for now, I’m aiming to build something truly useful for analysts, researchers, and students who want more than just code generation — they want intelligent support.

    You can try it here: Estima

    Thanks for checking it out, and I’m all ears for feedback or suggestions.

    Best,
    Josh Zweig (See more about me in my Substack)

  • #2
    Same promt as here:
    I'm working with Stata. I want to add the four smallest and largest values to the returned results of -summarize-. Can you suggest code to do that?
    Answer:
    For your specific request, you can use the 'extreme' option of the -summarize- command to show the extreme (smallest and largest) values. Here's how you can do it:
    Obviously, there is no extreme option for summarize, so slightly hallucinating here. Also, summarize returns minimum and maximum even without option detail (and even with the meanonly option).

    Code suggestion:
    Code:
    summarize varname, detail
    return list  // This shows what's currently stored in return memory
    
    // To store extremes in return memory
    local extremes
    forvalues i = 1/4 {
        sort varname
        local extremes `extremes' r(min`i')=`=varname[`i']'
        sort varname
        local extremes `extremes' r(max`i')=`=varname[-`i']'
    }
    
    // Now use the results
    return add `extremes'
    return list  // This will show all results, including the extremes
    Sorts twice [in each iteration!]; why? Negative indices aren't allowed in Stata. Doesn't handle situations with fewer than 4 observations. Treats missing values as valid ones. Overall, not super impressive, but somewhat better than competitors.

    Edit: I should add explicitly that this isn't to diminish the contribution; just pointing out that current limits are fairly easy to encounter.
    Last edited by daniel klein; Yesterday, 12:51.

    Comment


    • #3
      As a very small twist on #2 -- not disturbing its general message -- negative indices are legal in Stata, just no help for this problem, and wrong in not giving what is wanted. (A reference to an observation with negative index always is evaluated as missing.

      Comment


      • #4
        Thank you both for your comments, I agree there's nothing particularly impressive at this stage. That said, if someone had asked three years ago whether this sort of thing would be possible, I don’t think many of us would have believed it.

        On the point about the chatbot not being fully accurate or perfect. I understand the concern, though I personally don’t see that as a major problem. With Estima, I’m not trying to build a substitute for expertise, but rather a companion while working in Stata. The bot is quite helpful for structuring problems, checking that your code addresses all the relevant points, and suggesting ways forward. It's true that it might not get things 100% right on the first go, but we’ve seen plenty of examples where a short exchange leads users to the solution they were after.

        In that sense, I see it as similar to messaging a colleague or posting a question here on the forum, except you get an answer instantly and can iterate from there.

        Just to be clear: I’m not trying to oversell what’s possible here, and I don’t think there’s any need to.

        Comment


        • #5
          Originally posted by FernandoRios View Post
          I see
          Yeah the programming example that was pointed out is the attitude of most llms to refuse to say I don’t know. And that it is still hard to program a language that is relatively less known
          i wonder if for edge cases like that one could add to the system prompt
          if unsure of how to do it on Stata and mata alone, use python for the back end
          Just following up to this FernandoRios - I thought and tried about using Python on the back. That did not fully work, but I should try again being able to compile try etc. Regarding Stata being less well known: that's true, and it's a limitation but system prompts are in a way the method to flag the right weights of the model. I iterated through different system prompts, I have few examples as well, but giving as specific as the on discussed here can get tricky (do you think otherwise?)

          Comment

          Working...
          X