Test for normality in a sub-sample of a complex survey design

Lara Guedes

Join Date: Mar 2023

Posts: 10
#1

Test for normality in a sub-sample of a complex survey design

30 Mar 2023, 08:16

Hi folks.

How can I test for normality in a sub-sample inside my main survey sample?
I wrote svy: swilk varoutcome (not even accounting for a sub-sample) and the output was swilk is not supported by svy with vce(linearized).
I can, on the other hand, test for normality in a sub-sample using swilk varoutcome if subgroup == 1 , but not accounting for a survey sample...

How can I do them both? The option now is to test for normality in my sub-sample, not weighting for the complex sample design, but it might be biased, right?!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35793
#2

31 Mar 2023, 02:43

The stated problem is that you can't do this in practice.

I want to suggest that it makes little sense in principle, which is my guess at why this isn't implemented. The implied sample size is typically massive and the implied spikiness of distribution considerable. Would you ever fail to reject a null in these circumstances?

I am all for looking at marginal distributions as context in any project, which may often indicate skewness and/or outliers that need thought, or imply the usefulness of transformed scales or particular link functions, However, testing for normality is highly over-rated. Looking at marginal distributions means, most usefully, looking at graphs and (circumspectly) summary statistics.
1 like
Comment
Lara Guedes

Join Date: Mar 2023

Posts: 10
#3

11 Apr 2023, 03:36

Hi Nick. Thanks for the reply!

My sample has ~ 5000, and my sub-sample ~ 2300, I'm afraid they could diverge in the distribution (considering also weighting), but maybe is just that I'm not so experienced with this data treatment / analysis, which makes me more conservative ahah. Would you feel confortable with this numbers/size to go on? My sample is not normal, so I'm handling my sub-sample that way, which now is causing me trouble, bc my best fitting model (gamma) is ~20 000 AIC.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35793
#4

11 Apr 2023, 04:25

If you're confident that your data are not normal then with that sample size a Shapiro-Wilk test is pointless, akin to asking an expert to establish that what you know is a giraffe is not an elephant.

A comparison of one subsample with its complement will still likely be highly relevant for the rest of your analysis. Quite how you do this is up to you. I tend to prefer quantile plots but histograms may work fine.
2 likes
Comment

Announcement

Test for normality in a sub-sample of a complex survey design

Comment

Comment

Comment