Hi StataList community,
I am currently working on a research paper looking at gender wage discriminiation including an Oaxaca-Blinder decomposition using Panel Data with rougly 50,000 observation, 10 different time periods (survey years) and the panel is unbalanced. For a first estimation I used pooled OLS and clustered by person identifier (id). Stata wouldn't allow me to cluster on the household level, as households differ depending on the survey year but the id remains constant.
Fixed-effects is not possible in my case since the main explanatory variable "sex" is constant over time and would be eliminated by using this model (including an interaction with a time-variant variable has obvious disadvantages in terms of interpretation).
Since I am using panel data I now made the assumption that my unobserved heterogeity is uncorrelated with my independent variables. In order to back this assumption I included a large number of time-fixed control variables which should account for a large degree of the unobserved heterogeneity. If this assumption hold true, my estimator should at least be consistent. However, due to possible serially correlated errors it might still not be efficient. To correct for this I would either use POLS with clustered SEs or a Random Effects Model.
As far as I understand, both would control for the potential serial correlation in the error term that arises from dealing with panel data. Do clustered SE generally suffice to get rid of autocorrelation? Under what conditions is it adviced to use the RE-model instead of just clustering SEs? Is there an efficiency gain from using RE compared to clustering?
Regards and thanks in Advance
Jorge
I am currently working on a research paper looking at gender wage discriminiation including an Oaxaca-Blinder decomposition using Panel Data with rougly 50,000 observation, 10 different time periods (survey years) and the panel is unbalanced. For a first estimation I used pooled OLS and clustered by person identifier (id). Stata wouldn't allow me to cluster on the household level, as households differ depending on the survey year but the id remains constant.
Fixed-effects is not possible in my case since the main explanatory variable "sex" is constant over time and would be eliminated by using this model (including an interaction with a time-variant variable has obvious disadvantages in terms of interpretation).
Since I am using panel data I now made the assumption that my unobserved heterogeity is uncorrelated with my independent variables. In order to back this assumption I included a large number of time-fixed control variables which should account for a large degree of the unobserved heterogeneity. If this assumption hold true, my estimator should at least be consistent. However, due to possible serially correlated errors it might still not be efficient. To correct for this I would either use POLS with clustered SEs or a Random Effects Model.
As far as I understand, both would control for the potential serial correlation in the error term that arises from dealing with panel data. Do clustered SE generally suffice to get rid of autocorrelation? Under what conditions is it adviced to use the RE-model instead of just clustering SEs? Is there an efficiency gain from using RE compared to clustering?
Regards and thanks in Advance
Jorge
Comment