Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Machine Learning setup

    How can I break down my loaded dataset into training set and test set, and develop random forest on the training set, calculating fit for both the training set and test set?
    Thank you for your help!

    Stata SE/17.0, Windows 10 Enterprise

  • #2
    The command keep will help you to retain only some records from a dataset. So start with a full dataset, keep the first half, save it as your training dataset, then reopen original, keep the second half and save it as test dataset..

    The commands keep and save are described both in the documentation installed with Stata and available online.

    Comment


    • #3
      Oh, so there's no way to sort of keep the whole dataset in memory, and assign labels to some subsets of it?

      Also, I wanted to split the data "randomly" based on a seed in a particular ratio, e.g. 70% training, 30% test...
      Thank you for your help!

      Stata SE/17.0, Windows 10 Enterprise

      Comment


      • #4
        first, note that unless you have tens of thousands of observations, splitting the data is generally a bad idea

        however, it can be done easily; start with assigning random numbers:
        Code:
        help runiform()
        then sort those and assign "categories" as you wish (such as the 70:30 you mention) and train on the 70 using, e.g., "if ..." as part of your command to limit the command to the training set

        Comment

        Working...
        X