Dear All,
I have a stratified survey sample of 1000 observations with 8-10 observations in a single stratum. I am certain that there are outliers in the data, and I am also certain that the majority of the outliers do not form a part of the population.
I would like to estimate population and subpopulation parameters with survey weights, such as the mean and standard deviation of the variables, and would also like to carry out a regression analysis with the data. I can clearly see, that the outliers have a high impact on the estimates. As I am certain that the majority of the outliers are not part of the population, I want these impacts removed from the estimates.
I understand that there are outlier detection techniques that can deal with some of these issues, however, as I understand those methods largely rely on the random sample assumption. This sample, however is a stratified sample, designed to incorporate all stratums of the population under analysis. Within a single stratum, sampling was close to random, however the sample size of 8-10 is too small to use the conventional outlier detection methods.
I am also aware of robust regression techniques that can be used to decrease the influence of outliers, however, as far as I know it is not well established how these shall be used in a weighted regression context.
Can you perhaps suggest a systematic way to detect/remove the influence the outliers in the sample?
Thank you.
I have a stratified survey sample of 1000 observations with 8-10 observations in a single stratum. I am certain that there are outliers in the data, and I am also certain that the majority of the outliers do not form a part of the population.
I would like to estimate population and subpopulation parameters with survey weights, such as the mean and standard deviation of the variables, and would also like to carry out a regression analysis with the data. I can clearly see, that the outliers have a high impact on the estimates. As I am certain that the majority of the outliers are not part of the population, I want these impacts removed from the estimates.
I understand that there are outlier detection techniques that can deal with some of these issues, however, as I understand those methods largely rely on the random sample assumption. This sample, however is a stratified sample, designed to incorporate all stratums of the population under analysis. Within a single stratum, sampling was close to random, however the sample size of 8-10 is too small to use the conventional outlier detection methods.
I am also aware of robust regression techniques that can be used to decrease the influence of outliers, however, as far as I know it is not well established how these shall be used in a weighted regression context.
Can you perhaps suggest a systematic way to detect/remove the influence the outliers in the sample?
Thank you.
Comment