Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to add machine learning algorithms to stata 14?

    Dear community,
    I am faced with the following situation: I am working on a remote server which only accepts stata 14 code. I can add ado files if I want. My research project requires the implementation of tree algorithms (random forests, gradient boost, adaboost) and ideally neural nets. I have little experience with stata and I am trying to make it work. I know that newer versions of stata contain a python/stata API but unfortunately I have to stay on stata14 (I don't run the server and it can not be updated. It also does not accept R or python code.) I am aware that stata was not made for this but unfortunately I do not have wiggle room in terms of the programming language.

    The question is: Does anyone have a good idea of how it would be possible to do this? Are there any .ado files that would allow me to move forward implementing some more machine learning heavy algorithms in stata14?


  • #2
    Hi Max,

    You can absolutely implement these algorithms in Stata's ado language. Later versions of Stata have machine learning models built in, but in Stata 14 you may need to write some of this by hand yourself (I don't know). Definitely not as easy as consuming tenser flow, but entirely possible given enough time and energy. Remember that nearly all machine learning models (particularly ANNs) are expressible in terms of linear algebra. Bearing that in mind, maybe start by seeing if any machine learning models are available in Stata 14 "out of the box". If it turns out you need to implement some of this yourself, you're going to want to learn the -matrix- framework. See the full documentation available with your installation. I think long term you also going to want to pick up mata if you want your code to scale. Finally, make sure to read the Stata guide for programmers. Ado is a surprisingly full featured language, and mata absolutely is a full featured language with a powerful matrix interface. Basically exactly what you need if you want to create machine learning models from scratch.

    Edit: you may also want to look through the user submitted code archives. It's possible someone has written a user submitted command that you can use.
    Last edited by Daniel Schaefer; 03 Nov 2022, 08:00.

    Comment


    • #3
      ff the remote server you are using is provided by your institution, then your institution is failing to meet your research needs.

      You are attempting to screw together machine learning models from scratch - when good models already exist in other environments - and the only tool you have to tighten the screws is a hammer. If software development is where your heart lies, that's a fine challenge, but if your research focus is in a substantive area, reinventing existing software wheels is a poor use of the time you have to do your research.

      My point - and it is implicit in what Daniel wrote - is that you should not underestimate the magnitude of the task you are attempting to undertake. Time now spent searching for alternatives to the path you are setting out on could be time well spent.

      Comment


      • #4
        at SSC, there are user-written routines; e.g., lassopack which only requires version 13; also look at -brain- which is also at SSC
        Last edited by Rich Goldstein; 03 Nov 2022, 09:22.

        Comment


        • #5
          Just to add somewhat to the point made in #3: I think I got a little too fixated on whether or not this is possible in Stata, without thinking much about whether or not such a thing is practical. I'm speaking as someone with almost 15 years of consistent, almost daily programing experience. I've completed projects ranging from web applications, server side API's, and desktop applications, to statistics and machine learning projects, and I've worked in around a dozen different programing languages. I've implemented my own artificial neural network in two separate classroom settings. So yes, if you have experience like mine, you could do this in Stata.

          As a practical matter, I would absolutely never try to roll my own ANN in a professional setting or on a serious long-term research project. Even if you have the right experience, the right theoretical background, and the time and energy to do such a thing, it would still be a massive waste of your time, energy, and resources. I am a big believer in using the right tool for the job, and the fact is the industry standard for machine learning is anaconda python3 with jupyter notebooks. All of the requisite software is free. The administrator of your server should just need to install some open source software.

          Basically, the best road forward here might be to advocate for yourself so that you can have the tools you need to be successful. If your boss, advisor, or whoever really is set on Stata, then at least advocate for a modern Stata 17 license. The license isn't expensive from an institutional point of view. Really though, if you are doing machine learning, you should be working in anaconda python 3, and you should write and test your code with jupyter notebooks. I really do think you would be best served by dropping Stata all together, never mind trying to roll your own machine learning library.

          One final note: you may be able to find some user defined commands on SSC that work for you, but please keep in mind that these commands can vary wildly in terms of quality and support. Caveat emptor.

          Comment


          • #6
            My thanks to Daniel Schaefer for further sharing his expertise and experience in post #5 to re-express my rant in post #3 in a more constructive fashion. And in a form that is suitable for sharing with those with whom you would need to advocate for yourself.

            My post #3 was a little too reflective of my experience, not with machine learning, but with the so-called law of the instrument, of which one statement is "if your only tool is a hammer, everything is a nail."

            Comment

            Working...
            X