Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The direction and significance of an independent variable changed in Multilevel Logistic Regression

    Dear Statalist community,

    I am seeking help for an issue that I ran into running a multilevel logistic regression. As seen in the screenshot below which I am showing the coefficients in odds ratio, when adding log_ingo on its own, it is positively significant, however, adding ranking and log_highperc respectively made log_ingo not signifiant, and most confusingly/concerning for me is that the direction of log_ingo changed. When I add both ranking and log_highperc, the variable log_ingo became significantly negative. I investigated correlation between these three variables and did not find high correlation or multiconearity issues (using VIF). I am wondering what could be the reason for log_ingo to change direction and even significance? I am confused because theoretically log_ingo should not be negative.

    I would really appreciate an insights on this. Thanks!
    Click image for larger version

Name:	statalist.png
Views:	1
Size:	79.4 KB
ID:	1727592




  • #2
    There is nothing surprising here. Whenever you add or remove variables from a regression model, the coefficients (or statistics derived from them, such as odds ratios) can change by any amount and in any direction. There is literally no reason to expect them to remain the same as, or even similar in any recognizable way to what they were before. In fact, the purpose of including covariates to models is typically to remove omitted variable bias. If adding covariates could not produce drastic changes in the other coefficients, it would not be possible for it to serve this important purpose, since the bias resulting from omitted variables could be of any magnitude and in either direction.

    Moreover, VIF does not provide relevant information about the possibility. While it is true that there must be some non-zero correlation between the variable being added (or removed) and the variable whose coefficient changes, it can be a tiny correlation that VIF will designate as unremarkable. The reason is that what matters is not the gross correlation between the added variable and the changing coefficient variable, but rather the part of the correlation that is due to variance that both of those variables share with the regression outcome variable--and the latter is not captured in VIF.

    The conclusion you can draw from what you have observed is that the apparent positive association of log_ingo with rua_highbi is an artifact that arises from log_ingo serving as a proxy for the omitted effects of ranking and log_highperc in model (2). When those effects are properly accounted for in model (5), the independent effect of log_ingo is seen to, in fact, be negative (and, as it happens, of similar magnitude).

    As for your observation that the effect of log_ingo should not, in theory, be negative, you have to consider the generic possibilities for this kind of situation:
    1. Your data are incorrect.
    2. Due to missing values for some observations of some variables leading to exclusion of the affected observations, the estimation sample is biased.
    3. If either the positive or negative value of the binary outcome rua_highbi is very rare, then even a sample size of ~10,000 may be too small.
    4. The sample from which you obtained your data is not drawn from a population to which the theory applies.
    5. There are still other important variables in the real world that are not accounted for in your model, so omitted variable bias persists.
    6. The added variables ranking or log_highperc may be colliders rather than confounders of the log_ingo:rua_highbi relationship, so that including them in the model has added, rather than reduced, bias.
    7. The theory itself is incomplete or incorrect in ways that cause it to badly predict the results of your study.

    Comment

    Working...
    X