Dear Statalist,
I have a conceptual question regarding the dichotomization of a continuous variable within a subsample extracted from a population-based sample.
If my goal is to dichotomize the variable based on cohort distribution (e.g., upper 40% vs bottom 60%), should I base the dichotomization on the original population sample or the analytical sample?
I should note that my analytical sample is not very representative of the original sample, resulting in a 60% distribution in the original sample that might be only 40% in my analytical sample.
Given this discrepancy, is it meaningful to dichotomize the variable based on the original sample and apply it in the analytical sample? Can I use a rationale such as "dichotomizing based on the original population sample can better reflect population characteristics" to justify this choice?
Could you please recommend any relevant literature that discusses this problem?
Thank you.
I have a conceptual question regarding the dichotomization of a continuous variable within a subsample extracted from a population-based sample.
If my goal is to dichotomize the variable based on cohort distribution (e.g., upper 40% vs bottom 60%), should I base the dichotomization on the original population sample or the analytical sample?
I should note that my analytical sample is not very representative of the original sample, resulting in a 60% distribution in the original sample that might be only 40% in my analytical sample.
Given this discrepancy, is it meaningful to dichotomize the variable based on the original sample and apply it in the analytical sample? Can I use a rationale such as "dichotomizing based on the original population sample can better reflect population characteristics" to justify this choice?
Could you please recommend any relevant literature that discusses this problem?
Thank you.