Can clustering replace the need to stratify in a sub analysis to account for high variance in the sample?

Nicole Fountas

Join Date: Feb 2018

Posts: 3
#1

Can clustering replace the need to stratify in a sub analysis to account for high variance in the sample?

19 Feb 2018, 20:37

Hello all -

I have data from the Gallup World Poll and am doing research on food insecurity (FI) and migrants in the Arab region. After my initial analysis and some surprising results, it seems that the relationship between FI and other covariates are divided and sometimes oppositional depending on where migrants live. I am trying to do a sub analysis where I stratify my population into two groups: Gulf migrants and Other migrants. There are cultural and economic differences between these two groups (and enough similarities within them) that justify addressing them separately.

However, when I stratify the sample, some cells in the Other migrant population have <5 observations (in nominal variables like type of employment tabulated with level of FI, etc). When I ran the analysis stratified, the SE and CIs were very extreme, I'm assuming due to the small cell sizes. So now I am trying to figure out how to refine the standard error estimation.

I am wondering if it is appropriate to use the vce(cluster -) option and cluster the Gulf/Other variable, rather than stratify? I have tried it in the analysis and am seeing much better SE and CIs. I've reviewed other posts and have gathered that it's atypical to cluster just two groups, and I also read that I could apply the vce(cluster -) option to the stratified analysis.

If the vce(cluster -) is not a valid option for accounting for the significant differences in how determinants relate to FI between Gulf and Other, is there another way to examine or adjust the variance in the sample? With this option, it seems I would have results from the whole sample, but that account for the variance, rather than separate sets of factors that are associated with levels of FI for Gulf migrants and then Other migrants.

The DV is categorical with three levels: Food Secure (FS), Moderately Food Insecure (ModFI), and Severely Food Insecure (SevFI). Because the proportional odds assumption is violated with some covariates, my initial analysis used mlogit and I identified the rrr of being more or less FI for each IV. I have been reading Richard Williams's writings about the gologit2 command, which may be more suitable/ simpler in interpretation, but I would still arrive at the same situation of how to avoid reducing my sample too much by stratifying. I am unfamiliar with nonparametric options, so finding a suitable way to account for the clustered nature of the data is my goal.

I can include more specific details from my analysis if that's helpful. Thank you!

Last edited by Nicole Fountas; 19 Feb 2018, 20:43.
Tags: cluster, gologit2, mlogit, stratify, variance

Announcement

Can clustering replace the need to stratify in a sub analysis to account for high variance in the sample?