Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting sample by dependent variable

    Does it ever make sense for one to split their sample by values of the dependent variable and then run a separate regression for each group?

  • #2
    I'm not completely sure I understand what you're asking, but if I do, the answer is no, it generally makes no sense.

    If you partition your data into subsets based on single values of the dependent variable, and then separately run regression in each of those subsets, in each subset the dependent variable will be a constant. Which means that all of the regression coefficients will be zero. (Or if you are doing logistic regressions, you will get no output at all because the outcome doesn't vary.) So you would get no information at all from such an analysis.

    Perhaps you mean something less extreme, such as separating the data into two groups, in one of which the dependent variable takes on values above some cutpoint, and in the other it takes on values below that cutpoint. (You could generalize this to several groups and cutpoints.) In that case, assuming the cutpoint doesn't leave you with only a single value of the dependent variable in one of the groups, you will at least get some results. The problem is then whether it is possible to make any use of those results. If the purpose of your regression analyses is to attempt to predict the value of the dependent variable, then you don't have useful information, because confronted with a new case whose dependent variable value you do not know, you cannot know which of the two regression results applies to that observation.

    So it would only make sense to do this if you have no intention of ever applying the regression results to cases with unknown values of the dependent variable. It is hard to think of real-world applied situations where that would be the case, but if you have some purpose for doing the regressions that does not require its results to be generalizable outside the data used to estimate it, then it might be sensible to do that if there is reason to believe that the relationships between the predictors and the dependent variable change in a way that depends on the dependent variable itself. But that kind of non-linearity suggests that purpose what is really needed is to transform the dependent variable in some way that makes the relationship to the predictors uniform.

    Comment


    • #3
      depending on why you want to do this, you might find quantile regression of interest; see
      Code:
      help qreg

      Comment


      • #4
        To expand on the useful notes of Clyde and Rich, selecting or cutting based on the dependent variable automatically creates a problem with the error term - it will no longer have the necessary properties for conventional regression. For example, the highest group will tend to be observations with positive real errors. Selecting on the dv almost always creates statistical problems in addition to the more general problems Clyde notes.

        Comment

        Working...
        X