Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired or unpaired t-test

    I would like to compare two samples -- one sample includes all the observations in my dataset. Another sample is a subset of the observations in my dataset. I want to compare the means of a set of variables for these two samples. Would I use a paired or unpaired t-test? If neither, what is the appropriate test to use?

  • #2
    Neither. What is appropriate is to compare the subset with the complementary subset. And, barring odd circumstances, that would be done with the unpaired t-test.

    Comment


    • #3
      Thanks, Clyde. So could one subset be an excluded sample from a regression and another subset be the included sample in a regression?

      Also, could you explain why the initial scenario is not appropriate?

      Thanks!

      Comment


      • #4
        So could one subset be an excluded sample from a regression and another subset be the included sample in a regression?
        Yes. In fact, that would be a common situation for this.

        Also, could you explain why the initial scenario is not appropriate?
        The unpaired ttest relies on the assumption that the observations in the two groups being compared are independent of each other. If one group is a subset of the other, that independence assumption is grossly violated. As for the paired ttest, it isn't even physically possible to apply to the whole set - subset situation because the n's are different and there is therefore no way to pair them up.

        Comment


        • #5
          Good morning, Clyde Schechter I'd like to follow up on Tracy's questions because I've also been asked to compare a sample subset for analysis to its larger dataset.

          The intent is to understand whether the subset is reflective of the main dataset (in terms of demographics distribution) or where some variables deviate. I recognize that this would violate the assumption of independent groups because the subset is those with treatment from 6-11 months while this full dataset is those with treatment 0-infinity months (not infinity, but more than 11).

          If ttests are not the appropriate way to compare these, then what other tests could be used?

          Comment


          • #6
            The answer is the same. Compare the subset 6-11 months with the complementary subset of 0-5 and 12+. The logic is simple. If the subset's variable distributions are similar to those of the entire set, then when you extract the subset, what is left must also have those same distributions. So the subset resembles the whole set if and only if it resembles the complementary set. If you wish to present the descriptive statistics of the whole set and the subset, that is fine. But formal testing should be based on subset and complementary subset.

            Comment

            Working...
            X