Hi all,
I am new to panel data regression analysis and Stata, so forgive me if my questions are too elementary. I have some questions about the use of random or fixed effect models and using the correct estimators.
Data characteristics:
- Panel data
- Balanced
- T (2009-2023) and N =17 European countries (T<N)
- Dependent variable is Self-rated bad or very bad health (SPHBVB)
- Independent variables are: Temporary employment (TEMPEMPL), Part time employment (PARTIME), self-employment (without employees (SELFEMPLnoEMPLs) and UNEMPLOYMENT. All variables in % .
Aim of analysis:
- To perform a regression analysis that is efficient and consistent under robustness tests
Method:
Perform OLS / FE and RE models or Mixed
1. First I checked the correlation with all variables
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png[/IMG]
Comment: Highest effect on dependent SELFEMPLnoEMPLs and PARTTIME. For independent variables, higher correlation between SELFEMPLnoEMPLs and PARTIME.
2. Then I did fixed and random regressions
In both cases only the variable SELFEMPLnoEMPLs was statistically significant.
3.Then, I conducted Breusch-Pagan test (xttest0) that showed: Prob>chibar2 = 0.0000 and RE is chosen over OLS.
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png[/IMG]
4.Tests: I checked my data for autocorrelation using serial and heteroscedasticity using xttest2 which shows that my data suffers from both problems.
a. xtserial SPHBVB TEMPEMPL PARTIME SELFEMPLnoEMPLs UNEMPLOYMENT
Wooldridge test for autocorrelation in panel data
H0: no first order autocorrelation
F( 1, 16) = 17.066
Prob > F = 0.0008
b. xttest2
Breusch-Pagan LM test of independence: chi2(136) = 216.828, Pr = 0.0000
Based on 15 complete observations over panel units
c. Also VIF and 1/VIF were satisfactory
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image006.png[/IMG]
5.After reading other questions on this site, I found that you can deal with autocorrelation and heteroscedasticity by clustering the standard errors. So i did the re effect regression using the cluster command. Only the variable SELFEMPLnoEMPLs shows to be significant.
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image008.png[/IMG]
xtoverid
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re robust cluster(id)
Sargan-Hansen statistic 3.062 Chi-sq(4) P-value = 0.5475
6. I omitted variables and after various efforts I found that only PARTTIME and SELFEMPLnoEMPLs are significant, although R-square decreased after omitting TEMPEMPL and then UNEMPLOYMENT
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image010.png[/IMG]
xtoverid
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re robust cluster(CODE)
Sargan-Hansen statistic 0.406 Chi-sq(2) P-value = 0.8161
So, xtoverid- output tells that -re- model is the way to go.
Questions
1. Do I accept the last findings (6) as the most appropriate and also suitable with theoretical models or discuss and accept (5):?
To finish, two last questions, very crucial for my research.
In many similar studies I found that social scientists are using ad-hoc a fixed model or at least present it at a comparative perspective although finally re effect model is provided to be proper.
IIa. I do appreciate for help, for some comments for the below finding (variable Welfare3 refer to three groups of countries. Group 3 include only one country; Group 1 include 12 countries; and group 2 has 4 countries)
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image012.png[/IMG]
IIb. Is it more appropriate to use a mixed model, e.g.
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image014.png[/IMG]
Or
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image016.png[/IMG]
Thank you,
Maria
I am new to panel data regression analysis and Stata, so forgive me if my questions are too elementary. I have some questions about the use of random or fixed effect models and using the correct estimators.
Data characteristics:
- Panel data
- Balanced
- T (2009-2023) and N =17 European countries (T<N)
- Dependent variable is Self-rated bad or very bad health (SPHBVB)
- Independent variables are: Temporary employment (TEMPEMPL), Part time employment (PARTIME), self-employment (without employees (SELFEMPLnoEMPLs) and UNEMPLOYMENT. All variables in % .
Aim of analysis:
- To perform a regression analysis that is efficient and consistent under robustness tests
Method:
Perform OLS / FE and RE models or Mixed
1. First I checked the correlation with all variables
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image002.png[/IMG]
Comment: Highest effect on dependent SELFEMPLnoEMPLs and PARTTIME. For independent variables, higher correlation between SELFEMPLnoEMPLs and PARTIME.
2. Then I did fixed and random regressions
In both cases only the variable SELFEMPLnoEMPLs was statistically significant.
3.Then, I conducted Breusch-Pagan test (xttest0) that showed: Prob>chibar2 = 0.0000 and RE is chosen over OLS.
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image004.png[/IMG]
4.Tests: I checked my data for autocorrelation using serial and heteroscedasticity using xttest2 which shows that my data suffers from both problems.
a. xtserial SPHBVB TEMPEMPL PARTIME SELFEMPLnoEMPLs UNEMPLOYMENT
Wooldridge test for autocorrelation in panel data
H0: no first order autocorrelation
F( 1, 16) = 17.066
Prob > F = 0.0008
b. xttest2
Breusch-Pagan LM test of independence: chi2(136) = 216.828, Pr = 0.0000
Based on 15 complete observations over panel units
c. Also VIF and 1/VIF were satisfactory
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image006.png[/IMG]
5.After reading other questions on this site, I found that you can deal with autocorrelation and heteroscedasticity by clustering the standard errors. So i did the re effect regression using the cluster command. Only the variable SELFEMPLnoEMPLs shows to be significant.
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image008.png[/IMG]
xtoverid
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re robust cluster(id)
Sargan-Hansen statistic 3.062 Chi-sq(4) P-value = 0.5475
6. I omitted variables and after various efforts I found that only PARTTIME and SELFEMPLnoEMPLs are significant, although R-square decreased after omitting TEMPEMPL and then UNEMPLOYMENT
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image010.png[/IMG]
xtoverid
Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re robust cluster(CODE)
Sargan-Hansen statistic 0.406 Chi-sq(2) P-value = 0.8161
So, xtoverid- output tells that -re- model is the way to go.
Questions
1. Do I accept the last findings (6) as the most appropriate and also suitable with theoretical models or discuss and accept (5):?
To finish, two last questions, very crucial for my research.
In many similar studies I found that social scientists are using ad-hoc a fixed model or at least present it at a comparative perspective although finally re effect model is provided to be proper.
IIa. I do appreciate for help, for some comments for the below finding (variable Welfare3 refer to three groups of countries. Group 3 include only one country; Group 1 include 12 countries; and group 2 has 4 countries)
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image012.png[/IMG]
IIb. Is it more appropriate to use a mixed model, e.g.
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image014.png[/IMG]
Or
[IMG]file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_image016.png[/IMG]
Thank you,
Maria
Comment