Hi everyone,
I'm investigating the influence of several independent variables (IVs) (measured on the district level, party in a district and individual level) on individual level campaign behaviour of ordinary candidates (index of survey items asking about campaign activities) by means of multiple linear regression (I know MLM would probably be better, but the scope of my research task (BA thesis) limits me to simple models). Data: N=1850 candidates At the moment i am proceeding in three steps using OLS regression:
1. model: y(campaign focus: 0-10; 10 for candidate, treated as metric)= a + ß1-6(Individual level variables such as: subjective chance of winning a seat (hierarchically structured binary variable: 0 - 5), gender, incumbency, etc.) --> checking assumptions etc.
2. model: adding context variables: list length (metric, party in a district level), seats the party in a district is expected to win and district magnitude (winnable seats in the district) --> checking assumptions etc. --> correct for linearity --> square list length and seats
3.model: adding party-dummies: the violation of the linearity assumption seems to be gone now for seats and list length, yet they remain in the direction and significant
However, my fear is that i cannot make reasonable inferences from the context variables because the effects of party-in a district level predictors are confounded with the effects of the party dummies. In fact, weak multicollinearity for seats (VIF almost 5) seems to suggest that.
I was thinking about some sort of clustering of the standard errors, could this potentially be a solution? Also, i am dealing with heteroskedascity in all of my models - most likely due to the skewed nature of my dependend variable. Using robust standart errors does not seem to change alot in my results. Is it better to use a transformation (log seems to be the solution) of my dependend variable instead, do i need to transform all my predictors aswell then?
Also, my entire research design is kind of explorative, as there is no research in the field for the country i am investigating. Right now, i am trying to save what i have whilst not ending up with a completely meaningless regression.
Kind regards
I'm investigating the influence of several independent variables (IVs) (measured on the district level, party in a district and individual level) on individual level campaign behaviour of ordinary candidates (index of survey items asking about campaign activities) by means of multiple linear regression (I know MLM would probably be better, but the scope of my research task (BA thesis) limits me to simple models). Data: N=1850 candidates At the moment i am proceeding in three steps using OLS regression:
1. model: y(campaign focus: 0-10; 10 for candidate, treated as metric)= a + ß1-6(Individual level variables such as: subjective chance of winning a seat (hierarchically structured binary variable: 0 - 5), gender, incumbency, etc.) --> checking assumptions etc.
2. model: adding context variables: list length (metric, party in a district level), seats the party in a district is expected to win and district magnitude (winnable seats in the district) --> checking assumptions etc. --> correct for linearity --> square list length and seats
3.model: adding party-dummies: the violation of the linearity assumption seems to be gone now for seats and list length, yet they remain in the direction and significant
However, my fear is that i cannot make reasonable inferences from the context variables because the effects of party-in a district level predictors are confounded with the effects of the party dummies. In fact, weak multicollinearity for seats (VIF almost 5) seems to suggest that.
I was thinking about some sort of clustering of the standard errors, could this potentially be a solution? Also, i am dealing with heteroskedascity in all of my models - most likely due to the skewed nature of my dependend variable. Using robust standart errors does not seem to change alot in my results. Is it better to use a transformation (log seems to be the solution) of my dependend variable instead, do i need to transform all my predictors aswell then?
Also, my entire research design is kind of explorative, as there is no research in the field for the country i am investigating. Right now, i am trying to save what i have whilst not ending up with a completely meaningless regression.
Kind regards
Comment