Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multicollinearity in Logistic Regression

    I'm conducting an exploratory analysis of the effect that several sociodemographic factors have on food purchasing. I'm using logistic regression to assess whether participants in my sample purchased a specific food or not (fruits and vegetables, lets say). The sociodemographic predictors that I'm using have substantial significance in the literature so I'd like to include as many as possible in the model. There is also support in the literature for the interaction between education and income, so I'm including that.

    The problem I'm having is in regard to the multicollinearity of some of the predictor variables. For instance, household participation in SNAP (food stamps) is more likely the lower the household income. It is also more likely for certain race categories in this sample. I've attempted to resolve these issues in two ways. First, I did two separate regressions; one for those in SNAP and one for those not in SNAP. The effect of income nor race was not different for these two analyses. Second, I ran the analyses with interaction terms to look at SNAP*Income and SNAP*Race (I did not use both interactions in the same analysis). Again, there were no significant findings for the terms.

    I'm wondering if both the stratified analyses and the interaction terms are enough to ensure that I'm assessing the effects of the predictors accurately.


    The predictors in my model are as follows:
    sex of the primary respondent (coded as male=0 and female=1);
    age of the primary respondent (continuous);
    race of the primary respondent (coded as White=1, Black=2, Asian=3 and Other=4);
    ethnicity of the primary respondent (coded as non-Hispanic=0 and Hispanic=1);
    education of the primary respondent (coded as less than a college education=0 and college education and above=1);
    household income (continuous);
    children younger than 18 in the household (continuous); household members older than 64 (continuous);
    household participation in SNAP (food stamps) in the last 12 months (coded as no=0 and yes=1);
    Interaction term education*Income

    Thank you for your help,
    Ryan

  • #2
    What is your question? The title of your post refers to multicollinearity, as does the first sentence of your second paragraph. Then in your second and third sentences you describe what is known as confounding. And the rest of the post describes your attempts to detect moderation (also known as effect modification or interaction). These are three entirely separate things, and each can exist in the presence or absence of either or both of the others.

    So what exactly is it you are concerned about?

    Comment


    • #3
      Let me add a couple of small points to Clyde's answer.

      If you want to use colinearity diagnostics, you can always run a regression. Since colinearity is a property of the rhs variables, the diagnostics from regression should apply pretty much to logit.

      There is also a literature about the interpretation of interactions in non-linear models that claims many of the tests that work for linear models are misleading in non-linear models. You might look at William Greene, Testing hypotheses about interaction terms in nonlinear models. Economics Letter, 2010.

      Comment

      Working...
      X