Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Highly Correlated variables

    Hi,

    I am struggling to decide if I should include two variables that are highly correlated (0.75) in regressions. Lets say the two variables are variable A and variable B. Variable A is theoretically identified as an important variable for my work. When I run a regression using just variable A along with other independent variables it is strongly significant ,however when I include variable B ,variable A turns insignificant.

    Hence my question is whether it is possible to argue that since variable A and Variable B are highly correlated I am excluding variable B from my Model as it introduces multicollinearity.

    Regards,
    Naveed

  • #2
    Naveed:
    yes, it makes sense.
    However, as multicollinearity affects at least two variables, the same justification would support the opposite approach (that is, including B rather than A).
    Hence, the whole matter boils down to the relevance of A and B in your regression model according to the literature in your research field.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      There is way too little information to give good advice here, but you should think about the following.

      First, it is hard to judge whether collinearity is a problem in a multiple regression model based on a bivariate correlation. Second, I would not say that a correlation of 0.75 is very (or even too) high per se and I would not expect it to cause any estimation problems, given an appropriate sample size. Third, you need to inspect carefully why the (coefficient for the) first variable turns insignificant. Is it because the standard error goes up? If so, this might be an indicator of collinearity. If, however, the coefficient gets insignificant because it drops towards zero, this might indicate a mediation effect of the second variable, or it might indicate that the second variable is a confounder. Omitting the variabe in the latter case will lead to a biased estimate of the effect of your first variable. This coefficient will be meaningless (which is not necessarily a problem). In case of a mediation process what to do depends on your exact research question (actually, this applies to all cases).

      [Edit]
      btw. what would help here a lot
      • Say more about the subtatial questions you are trying to answer. What is A? What is B?
      • Show what you typed in Stata and show us what Stata reported back. In this case, show the regression output of the two models in question.
      • Use (arguably more) appropriate methods to diagnose colliniarity, e.g. estat vif
      [/Edit]

      Best
      Daniel
      Last edited by daniel klein; 16 Nov 2016, 08:44.

      Comment


      • #4
        Grateful for both the replies. It has aided in me understanding my problem better.

        Comment

        Working...
        X