Dear Stata community, I'm doing my Master thesis and I (might) have a problem with multicollinearity.
I work with a TSCS data set:
203 Cases, each up to 12 points in time.
I work with a PCSE model.
Now I want to test for VIF (variance inflation factor).
I read in a script online that deals with regular regression that there are 2 steps to test for VIF:
1. do a regression
2. enter command: "vif"
Mean VIF of more than 10 -> you have serious problem with multicollinearity.
My question is: Can I do so with my PCSE model? I tried:
1. Load data set
2. Tell Stata that it is dealing with TSCS data:
tsset
panel variable: number (strongly balanced)
time variable: year, 2000 to 2011
delta: 1 year
Looks good.
3. now I do my PCSE
. xtpcse aV_Transform uV1_GDPgrowth uV2_GDPpC uV3_GDPpC_percentageEUaverage uV4_unemploy
> ment uV5_cohesion_gap_pC if aV_n30>0, correlation(ar1) pairwise
results as always, nothing special.
4. Now I do VIF
vif
not appropriate after regress, nocons;
use option uncentered to get uncentered VIFs
r(301);
:-(
My idea:
Do a regular regression AFTER you tell Stata its dealing with a TSCS data set:
1. tsset
panel variable: number (strongly balanced)
time variable: year, 2000 to 2011
delta: 1 year
2. regress aV_Transform uV1_GDPgrowth uV2_GDPpC uV3_GDPpC_percentageEUaverage uV4_unemplo
> yment uV5_cohesion_gap_pC if aV_n30<1
(results are fine)
2. VIF
My first question:
Now I have something, but is this something?
Is my method okay or do I eliminate the tscs-information by using regress?
Second question/problem:
Why is Mean VIF lower than 10 when I know from other tests that there is an strong correlation of about 0.9 between uV2 and uV3.
Is it important to look for the numbers within the table? 16.62 and 16.07 is pretty similar and way beyond 10. As Mean VIF seems to be what its name says (mean), one could conclude if I would use VIF just with uV2 and uV3 the Mean VIF would be 16.24 -> obvious then there would be a strong problem with (multi)collinearity.
So is it myoptic to look at the VIF result?
I would thank you for any support in this case.
Best wishes from Germany, TU Darmstadt
Rainer Müller
EDIT:
A further research showed, that my way does not work. Time series cannort be analysed by VIF. A way to do it is to split the time serie into single years. So you can run the test on each year i.e. an a simple regression model.
If anyone has a better idea, please tell me. I am very curious.
I work with a TSCS data set:
203 Cases, each up to 12 points in time.
I work with a PCSE model.
Now I want to test for VIF (variance inflation factor).
I read in a script online that deals with regular regression that there are 2 steps to test for VIF:
1. do a regression
2. enter command: "vif"
Mean VIF of more than 10 -> you have serious problem with multicollinearity.
My question is: Can I do so with my PCSE model? I tried:
1. Load data set
2. Tell Stata that it is dealing with TSCS data:
tsset
panel variable: number (strongly balanced)
time variable: year, 2000 to 2011
delta: 1 year
Looks good.
3. now I do my PCSE
. xtpcse aV_Transform uV1_GDPgrowth uV2_GDPpC uV3_GDPpC_percentageEUaverage uV4_unemploy
> ment uV5_cohesion_gap_pC if aV_n30>0, correlation(ar1) pairwise
results as always, nothing special.
4. Now I do VIF
vif
not appropriate after regress, nocons;
use option uncentered to get uncentered VIFs
r(301);
:-(
My idea:
Do a regular regression AFTER you tell Stata its dealing with a TSCS data set:
1. tsset
panel variable: number (strongly balanced)
time variable: year, 2000 to 2011
delta: 1 year
2. regress aV_Transform uV1_GDPgrowth uV2_GDPpC uV3_GDPpC_percentageEUaverage uV4_unemplo
> yment uV5_cohesion_gap_pC if aV_n30<1
(results are fine)
2. VIF
HTML Code:
Variable | VIF 1/VIF -------------+---------------------- uV3_GDPpC_~e | 16.62 0.060155 uV2_GDPpC | 16.07 0.062230 uV5_cohesi~C | 1.29 0.774455 uV4_unempl~t | 1.21 0.825685 uV1_GDPgro~h | 1.01 0.987903 -------------+---------------------- Mean VIF | 7.24
My first question:
Now I have something, but is this something?
Is my method okay or do I eliminate the tscs-information by using regress?
Second question/problem:
Why is Mean VIF lower than 10 when I know from other tests that there is an strong correlation of about 0.9 between uV2 and uV3.
Is it important to look for the numbers within the table? 16.62 and 16.07 is pretty similar and way beyond 10. As Mean VIF seems to be what its name says (mean), one could conclude if I would use VIF just with uV2 and uV3 the Mean VIF would be 16.24 -> obvious then there would be a strong problem with (multi)collinearity.
So is it myoptic to look at the VIF result?
I would thank you for any support in this case.
Best wishes from Germany, TU Darmstadt
Rainer Müller
EDIT:
A further research showed, that my way does not work. Time series cannort be analysed by VIF. A way to do it is to split the time serie into single years. So you can run the test on each year i.e. an a simple regression model.
If anyone has a better idea, please tell me. I am very curious.