No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • multiple linear regression, individual and group level predictors

    Hi everyone,

    I'm investigating the influence of several independent variables (IVs) (measured on the district level, party in a district and individual level) on individual level campaign behaviour of ordinary candidates (index of survey items asking about campaign activities) by means of multiple linear regression (I know MLM would probably be better, but the scope of my research task (BA thesis) limits me to simple models). Data: N=1850 candidates At the moment i am proceeding in three steps using OLS regression:

    1. model: y(campaign focus: 0-10; 10 for candidate, treated as metric)= a + ß1-6(Individual level variables such as: subjective chance of winning a seat (hierarchically structured binary variable: 0 - 5), gender, incumbency, etc.) --> checking assumptions etc.

    2. model: adding context variables: list length (metric, party in a district level), seats the party in a district is expected to win and district magnitude (winnable seats in the district) --> checking assumptions etc. --> correct for linearity --> square list length and seats

    3.model: adding party-dummies: the violation of the linearity assumption seems to be gone now for seats and list length, yet they remain in the direction and significant

    However, my fear is that i cannot make reasonable inferences from the context variables because the effects of party-in a district level predictors are confounded with the effects of the party dummies. In fact, weak multicollinearity for seats (VIF almost 5) seems to suggest that.

    I was thinking about some sort of clustering of the standard errors, could this potentially be a solution? Also, i am dealing with heteroskedascity in all of my models - most likely due to the skewed nature of my dependend variable. Using robust standart errors does not seem to change alot in my results. Is it better to use a transformation (log seems to be the solution) of my dependend variable instead, do i need to transform all my predictors aswell then?

    Also, my entire research design is kind of explorative, as there is no research in the field for the country i am investigating. Right now, i am trying to save what i have whilst not ending up with a completely meaningless regression.

    Kind regards

  • #2
    welcome to this forum.
    As per FAQ, please note that your chances of getting helpful replies are conditional on posting what you typed and what Stata gave you back (via CODE delimiters). Thanks.
    Kind regards,
    (Stata 15.1 SE)


    • #3
      To expand on Carlo's comment, you'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also, try to simplify your posting to the critical issues.

      To try to help on the substance, a VIF of 5 seldom deserves attention. Whether something is "confounded" depends on understanding the specification and situation in detail. It might be that your effects are not just the single parameter itself but a sum or difference in parameters. e.g., the effect of party in a given area might be a general effect of party plus the effect in that area (which is estimating the difference from the general value).

      If you have heterskedascity, then clustered standard errors make sense. While some transform dv's to reduce such issues, you are estimating a different model when you do so - I think applied folks differ on how advisable this is, particularly if the problem is something robust standard errors can fix.