I'm conducting an exploratory analysis of the effect that several sociodemographic factors have on food purchasing. I'm using logistic regression to assess whether participants in my sample purchased a specific food or not (fruits and vegetables, lets say). The sociodemographic predictors that I'm using have substantial significance in the literature so I'd like to include as many as possible in the model. There is also support in the literature for the interaction between education and income, so I'm including that.
The problem I'm having is in regard to the multicollinearity of some of the predictor variables. For instance, household participation in SNAP (food stamps) is more likely the lower the household income. It is also more likely for certain race categories in this sample. I've attempted to resolve these issues in two ways. First, I did two separate regressions; one for those in SNAP and one for those not in SNAP. The effect of income nor race was not different for these two analyses. Second, I ran the analyses with interaction terms to look at SNAP*Income and SNAP*Race (I did not use both interactions in the same analysis). Again, there were no significant findings for the terms.
I'm wondering if both the stratified analyses and the interaction terms are enough to ensure that I'm assessing the effects of the predictors accurately.
The predictors in my model are as follows:
sex of the primary respondent (coded as male=0 and female=1);
age of the primary respondent (continuous);
race of the primary respondent (coded as White=1, Black=2, Asian=3 and Other=4);
ethnicity of the primary respondent (coded as non-Hispanic=0 and Hispanic=1);
education of the primary respondent (coded as less than a college education=0 and college education and above=1);
household income (continuous);
children younger than 18 in the household (continuous); household members older than 64 (continuous);
household participation in SNAP (food stamps) in the last 12 months (coded as no=0 and yes=1);
Interaction term education*Income
Thank you for your help,
Ryan
The problem I'm having is in regard to the multicollinearity of some of the predictor variables. For instance, household participation in SNAP (food stamps) is more likely the lower the household income. It is also more likely for certain race categories in this sample. I've attempted to resolve these issues in two ways. First, I did two separate regressions; one for those in SNAP and one for those not in SNAP. The effect of income nor race was not different for these two analyses. Second, I ran the analyses with interaction terms to look at SNAP*Income and SNAP*Race (I did not use both interactions in the same analysis). Again, there were no significant findings for the terms.
I'm wondering if both the stratified analyses and the interaction terms are enough to ensure that I'm assessing the effects of the predictors accurately.
The predictors in my model are as follows:
sex of the primary respondent (coded as male=0 and female=1);
age of the primary respondent (continuous);
race of the primary respondent (coded as White=1, Black=2, Asian=3 and Other=4);
ethnicity of the primary respondent (coded as non-Hispanic=0 and Hispanic=1);
education of the primary respondent (coded as less than a college education=0 and college education and above=1);
household income (continuous);
children younger than 18 in the household (continuous); household members older than 64 (continuous);
household participation in SNAP (food stamps) in the last 12 months (coded as no=0 and yes=1);
Interaction term education*Income
Thank you for your help,
Ryan
Comment