Dear all,
I have an econometrics question/problem for you:
Assume that you examine the impact of X1 & X2 on Y based on a sample and obtain the respective coefficients. Let’s assume further that you create sub-samples from the sample above based on the distribution of X1 (25th percentile, 50th, 75th, 90th). Hence, you have 4 sub-samples of the initial sample. For example, one of the sub-sample would contain observations based on the condition that X1 <= 25th percentile, second sub-sample would be based on the condition that X1> 25th percentile & X1<=50th percentile and so on.
I believe that such methodology and the coefficients obtained are problematic, primarily because the sample selection (or the sub-samples) is non-random as it's based on the distribution of X1. However, I have seen results based on such methods published in “decent” journals. Having said that, I don’t know or cannot provide the theoretical justification or statistical intuition as to why the coefficients obtained based on this method are problematic or inferior compared to the estimates obtained from the entire sample.
Can anyone educate me on the disadvantages or consequences of using such non-random subsample ? Also are there any journal articles or books that discuss this issue that I can perhaps cite?
Thanks in advance,
Rishav
I have an econometrics question/problem for you:
Assume that you examine the impact of X1 & X2 on Y based on a sample and obtain the respective coefficients. Let’s assume further that you create sub-samples from the sample above based on the distribution of X1 (25th percentile, 50th, 75th, 90th). Hence, you have 4 sub-samples of the initial sample. For example, one of the sub-sample would contain observations based on the condition that X1 <= 25th percentile, second sub-sample would be based on the condition that X1> 25th percentile & X1<=50th percentile and so on.
I believe that such methodology and the coefficients obtained are problematic, primarily because the sample selection (or the sub-samples) is non-random as it's based on the distribution of X1. However, I have seen results based on such methods published in “decent” journals. Having said that, I don’t know or cannot provide the theoretical justification or statistical intuition as to why the coefficients obtained based on this method are problematic or inferior compared to the estimates obtained from the entire sample.
Can anyone educate me on the disadvantages or consequences of using such non-random subsample ? Also are there any journal articles or books that discuss this issue that I can perhaps cite?
Thanks in advance,
Rishav
Comment