Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • split-sample internal validation logistic regression

    Hello all,

    I am attempting to perform split-sample internal validation on a logistic regression model I have created (logistic command). I understand there are limitations to this approach, but would like to perform it nonetheless.

    Here is my understanding of how to approach the problem:

    1) split dataset into two groups - 2/3 "training group", 1/3 "validation group"
    2) create logistic model with "training group"

    3) predict outcome for "validation group" using logistic model
    4) compare predicted outcome with actual outcome in "validation group"

    I am unsure what code is needed to complete step 3. And I am unsure what tests/code is required to complete step 4.

    Apologies for my lack of statistical knowledge (which I hope to improve). And apologies if this question is more of a "how to do" rather than stata troubleshooting. I have searched this forum for similar questions but came up empty.

    Any help would be greatly appreciated.

    Kind regards

  • #2
    Assuming you fit the model with an if statement to identify the group (e.g., if testing == 0), you could:

    Code:
    predict yhat if !e(sample)
    ta outcome yhat
    any other equivalent statement. After an estimation command e(sample) can be used to identify the estimation sample. You could also use predict without an if statement and you'd get predictions for the entire data set, then compare the outcome for the testing sample.

    Comment


    • #3
      while I agree with William's first command, the second may not even work (depending on your N) and, in my opinion, will be much less useful than
      Code:
      lowess depvar yhat, addplot(function y=x, range(0 1)) legend(off)
      where you should replace "depvar" with the name of your outcome variable

      Comment

      Working...
      X