Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustered SE vs. Random Effects

    Hi StataList community,

    I am currently working on a research paper looking at gender wage discriminiation including an Oaxaca-Blinder decomposition using Panel Data with rougly 50,000 observation, 10 different time periods (survey years) and the panel is unbalanced. For a first estimation I used pooled OLS and clustered by person identifier (id). Stata wouldn't allow me to cluster on the household level, as households differ depending on the survey year but the id remains constant.

    Fixed-effects is not possible in my case since the main explanatory variable "sex" is constant over time and would be eliminated by using this model (including an interaction with a time-variant variable has obvious disadvantages in terms of interpretation).

    Since I am using panel data I now made the assumption that my unobserved heterogeity is uncorrelated with my independent variables. In order to back this assumption I included a large number of time-fixed control variables which should account for a large degree of the unobserved heterogeneity. If this assumption hold true, my estimator should at least be consistent. However, due to possible serially correlated errors it might still not be efficient. To correct for this I would either use POLS with clustered SEs or a Random Effects Model.
    As far as I understand, both would control for the potential serial correlation in the error term that arises from dealing with panel data. Do clustered SE generally suffice to get rid of autocorrelation? Under what conditions is it adviced to use the RE-model instead of just clustering SEs? Is there an efficiency gain from using RE compared to clustering?

    Regards and thanks in Advance

    Jorge
    Last edited by Lost Student; 18 Jun 2017, 03:14.

  • #2
    Lost Student Jorge: please note the prference on this forum for real family name, too (and the re-registering procedure provided by the FAQ. Thanks).
    You gave no excerpt of your data (see -search dataex-), so the following remarks will be unavodably broad:
    - it is not clear why you did not consider using -xtreg- instead of a pooled OLS;
    - under -xtreg- heteroskedasticity and/or autocorrelation are dealt with robustified/clustered standard errors;
    - it is not clear why you did not compare -fe- vs -re- specification via the user-written command -xtoverid- (type -search xtoverid- from within Stata to install it; please consider that it needs the old-fashioned -xi- prefix to support -fvvarlist- notation).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo for the quick reply,

      Originally posted by Carlo Lazzaro View Post
      Lost Student Jorge: please note the prference on this forum for real family name, too (and the re-registering procedure provided by the FAQ. Thanks).
      You gave no excerpt of your data (see -search dataex-), so the following remarks will be unavodably broad:
      - I do not own the data and I am not allowed to share excerpts of it. Sorry about aht. It is totally clear to me that the discussion about the issue will thus remain quite general. The question raised is rather a theoretical rather than a practical one.


      Originally posted by Carlo Lazzaro View Post
      Lost Student Jorge: please note the prference on this forum for real family name, too (and the re-registering procedure provided by the FAQ. Thanks).
      it is not clear why you did not compare -fe- vs -re- specification via the user-written command -xtoverid- (type -search xtoverid- from within Stata to install it; please consider that it needs the old-fashioned -xi- prefix to support -fvvarlist- notation).
      - As I mentioned above, our variable of interest is a time-fixed effect. Using a FE-model would simply eliminate my 'sex'- variable which is my main variable of interest. Stata does not even allow me to conduct a Hausman test. That is why I ended up making the assumption that my unobserved heterogeneity is uncorrelated with my x.

      Originally posted by Carlo Lazzaro View Post
      Lost Student Jorge: please note the prference on this forum for real family name, too (and the re-registering procedure provided by the FAQ. Thanks).
      it is not clear why you did not consider using -xtreg- instead of a pooled OLS
      - I used xtreg and computed a fGLS Random-Effects estimation. The main question is: is Random Effects estimation even necessary or would a pooled OLS with clustered SEs already be efficient?

      Kind Regards
      Jorge Brinkmann
      Last edited by Lost Student; 18 Jun 2017, 04:21.

      Comment


      • #4
        Jorge:
        Ok, let's keep staying general:
        I'm not clear with your statement
        ...Stata would not even allow a Hausman test...
        In general, -xtreg, re- outperforms -pooled OLS if random individual effects do exist.
        Please note, however, that the underlying assumption of no correlation between individual effects and the vector of regression is often unrealistic (that is, both -fe- and -re- specification have their own weakenesses).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X