Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustering standard errors and including fixed effects when the regressor is cluster-invariant

    Dear Statalisters,

    I would appreciate your help regarding both the appropriate level of clustering standard errors and including fixed effects in my model.

    I have a repeated cross-section of individuals nested in countries and further nested in regions (provinces/ counties/ lands etc., depending on a country). There are 15 years and, say, 42 countries, but each country is surveyed at most 7 times (and usually less). I’m interested in estimating the relationship between income inequality (measured at a country level; the estimates come from external data) and individual life satisfaction. Since life satisfaction is an ordinal variable, varying from 1 (completely dissatisfied) to 10 (completely satisfied), I use the following model:

    Code:
    oprobit life_satisfaction Gini log_GDP_per_capita individual_characteristics i.year i.country, cluster(?)
    (individual_characteristics is a set of variables like age, education etc.)

    1) Is including both country and year fixed effects necessary?
    I know that people usually follow this path, but including country fixed effects turns my results insignificant and I wonder whether it has something to do with the fact that my regressor of interest (the Gini) is invariant for all individuals in country c at time t. Would it be correct to keep only year fixed effects in this case? [I’m aware that fixed effects in ordered probit are tricky, but I have a large number of observations within each country-year group [1].]

    2) What is the appropriate level of clustering standard errors in this case?
    Cameron and Miller (2015) say “The consensus is to be conservative and avoid bias and to use bigger and more aggregate clusters when possible, up to and including the point at which there is concern about having too few clusters” [2]. So I would be prompted to cluster at the country level. But the literature using this particular dataset is inconsistent and some researchers cluster at regional level (regions nested in countries), some at survey level (close to year level, but not exactly) and some do not cluster at all. I have seen the recently updated paper by Abadie, Athey, Imbens and Wooldridge (2017, updated 2022) but I’m not sure how it aplies to my case [3]. I feel that clustering at the country level is unncesarily conservative (my results turn insignificant compared to clustering at the regional level), but what is the rationale behind clustering at the regional level? As far as I know, the sampling design took some form of stratified random sampling with regions being the first stages (within countries), but is this enough reason for cluster(region) if there is no variation in the Gini between the groups of regions?


    Any thoughts would be very helpful to me. Thank you in advance.

    [1] https://www.stata.com/statalist/arch.../msg00103.html

    [2] http://jhr.uwpress.org/content/50/2/317.short

    [3] https://arxiv.org/abs/1710.02926
    Last edited by Katarzyna Salach; 15 Nov 2022, 08:45.
Working...
X