Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Same sample size with different dependent variables

    Hello,

    I have four different variables for four models. The models have the same independent variables because I am using the same database. How can I keep the same sample size for the four different models?
    Thanks you so much in advance.

    Best,
    Luisa

  • #2
    Well, in any model estimation, the estimation sample is those observations that have no missing values on any of the model variables. So differences in sample size will arise if there are different numbers of observations with missing values for the four dependent variables. So, one simple way to assure the same sample size for all four model estimations is to restrict them all to those observations with no missing values on any of them. So if the dependent variables are dv1, dv2, dv3, and dv4 you could condition all four models with -if !missing(dv1, dv2, dv3, dv4)-.

    But excluding otherwise admissible observations is not usually a good idea. And, particularly when it is conditioned on missingness of something, it is likely to introduce bias. So you may end up with four biased samples for your models.

    So, my final response here is to ask why you want to do this? What purpose is served by having all four models estimated with the same sample size?

    Comment


    • #3
      Correction to #2. In some models, even complete cases can be excluded. For example, in a conditional logistic regression with grouped data, observations that form singleton groups are dropped, as are any groups where the outcome variable is constant within the group. In ordinary logistic regressions, one can encounter perfect prediction, which will lead to the omission of some observations. So it's not even as simple as I suggested in #2. So, even more strongly now, I wonder why you want to do this? Perhaps if you explain what your real goal is, concrete advice could be given to help you accomplish that goal, which might or might not entail equalizing the sample sizes.

      Comment


      • #4
        You can easily run one of the models, generate a variable used=e(sample), run the next model if used==1, replace use==e(sample), and continue At the end used will only include the observations that work in all of the models. Then use this sample for the results you care about.

        As Clyde points out that this can create undesirable sample selection issues. But this has two caveats. If you've got different samples for different estimates, then in some sense you already have some sample selection issues. If you were using the same dv but comparing different explanatory models based on fit, it may reduce the possibility that differences in model fits depend on differences in samples rather than model differences.



        Comment

        Working...
        X