Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multicollinearity problem?


    Dear colleagues, I would like your help if possible: When I add the variable "average_lp_2" (Average of the school in Portuguese), the estimates for "sase_esc" that were positive before become negative. Can someone help me?
    Attached Files

  • #2
    Can you be more particular of how help here might look like?

    You might also want to explain what the variables that change sign mean, and how the meaning is related to the meaning of the variable that you are adding.

    Comment


    • #3

      Good night Joro kolev, In the first model above the estimates for each category of sase_esc are positive. These variables identify the socioeconomic level of the school. Therefore, schools belonging to category 6 have better infrastructure compared to those with a lower category. When I add the variable "media_lp_2" to the second model, the coefficients of the variable sase_esc become negative, implying, for example, that schools in group 5 have 1.77 points less in the exam when compared to schools in group 1 (reference). That is, negative judgments in the second model make no sense considering that the reference group is group 1.

      Comment


      • #4
        So this is very similar to Lord's "paradox," which is a continuous version of Simpson's "paradox."

        Both the sase_esc and media_lp_2 variables are school-level variables. It also appears that med_lp_2 comes close to being an identifier of individual schools, because its inclusion has, in addition to changing the signs of most of the sase_esc variables, soaked up nearly all of the variance of the random effect at the school level. As such, the socio-economic status information conveyed by sase_esc is now confined to the information conditional on med_lp_2. So there is no surprise that it can be very different from, even opposite in sign, to the information sase_esc would convey in the absence of the med_lp_2 variable. This conforms to the general observation that socioeconomic status of the student body and school performance in almost anything tend to be highly correlated with each other.

        So, yes, this is a species of multicolinearity. The question is whether it is a problem or just something you can ignore. If your research goals include estimating the effects of socioeconomic status or med_lp_2 on pt_ex_9mat, then you have been checkmated. It cannot be done with this data set. You will either to get a much larger data set to disentangle these confounded effects, or you will need to get an entirely different sampling design that breaks the relationship between socioeconomic status and med_lp_2 by oversampling low ses high med_lp_2 and high ses low med_lp_2 schools. If, however, these variables are included only to control for their confounding effects on the relationships you are really interested in studying, then you can just ignore this and move on.

        Another thing you might do is draw a directed acyclic graph representing the assumed causal relationships among all the variables in the model. You may find that one or both of socioeconomic status and med_lp_2 are colliders on the path of something whose causal effect you are trying to estimate. Then you would know that the model excluding that one would be preferred.

        Comment

        Working...
        X