Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Robust standard errors vs clustered standard errors

    Dear all,

    I am doing an analysis of the pollution haven effect in the German manufacturing industry. I use an IV approach with time, country, and industry fixed effects.

    I a first specification, I am using robust standard errors as I have heteroscedasticity. My estimators are negative as expected, but insignificant.
    Code:
    xi: ivreg2 lnGermanFDIs lnInfrastructureIndex lnQualityofPublicSchools lnCapitalLaborRatios lnOrganizedCrimeIndex lnDistancekm Commonlanguage  i.Country i.Year i.Industry (lnGDP lnEnvPolicyIndex lnTariffRate lnIPRP = Tractorsagriculturalworker Landagriculturalworker Regionalcapitallaborratios RegionalOrganizedCrime Regionalpublicschoolquality Regionalinfrastructurequality Regionaltractorsagriculturalwo Regionallandagriculturalworker), robust endog(lnGDP lnEnvPolicyIndex lnTariffRate lnIPRP)
    However, I use clustered standard errors, my estimators become significant. The significance of course depends on the fact whether I use
    Code:
    xi: ivreg2 lnGermanFDIs lnInfrastructureIndex lnQualityofPublicSchools lnCapitalLaborRatios lnOrganizedCrimeIndex lnDistancekm Commonlanguage  i.Country i.Year i.Industry (lnGDP lnEnvPolicyIndex lnTariffRate lnIPRP = Tractorsagriculturalworker Landagriculturalworker Regionalcapitallaborratios RegionalOrganizedCrime Regionalpublicschoolquality Regionalinfrastructurequality Regionaltractorsagriculturalwo Regionallandagriculturalworker), robust cluster(Countryj Year Industryi) endog(lnGDP lnEnvPolicyIndex lnTariffRate lnIPRP)
    or only single variables in my clusters e.g.
    Code:
    xi: ivreg2 lnGermanFDIs lnInfrastructureIndex lnQualityofPublicSchools lnCapitalLaborRatios lnOrganizedCrimeIndex lnDistancekm Commonlanguage  i.Country i.Year i.Industry (lnGDP lnEnvPolicyIndex lnTariffRate lnIPRP = Tractorsagriculturalworker Landagriculturalworker Regionalcapitallaborratios RegionalOrganizedCrime Regionalpublicschoolquality Regionalinfrastructurequality Regionaltractorsagriculturalwo Regionallandagriculturalworker), robust cluster(Countryj) endog(lnGDP lnEnvPolicyIndex lnTariffRate lnIPRP)
    I know that I have to use clustered standard errors if there is correlation of disturbances within groups. Is there any test to decide for which variables I need clusters? Or do I have to use economic theory to decide whether I use clustered se or not?




  • #2
    Dear Teresa,

    There are indeed tests to do it. Jeff Wooldridge proposes a test for that in his book, and I have also done something similar using quantile regression (see -qreg2- and the references therein). However, these tests are not designed to be used in an IV context. One thing that you can perhaps do is to estimate the reduced form of lnGermanFDIs and apply the tests to that model (if you use -qreg2- you will estimate the model by median regression, which will not give you the usual reduced form, but the test should nonetheless be informative).

    Jeff Wooldridge often contributes to the forum, so hopefully he will give you a more authoritative answer.

    All the best,

    Joao

    Comment


    • #3
      A few thoughts:

      1. (least important) ivreg2 supports at most 2-way clustering. The syntax is a bit too forgiving if you ask for 3-way clustering; it simply ignores the 3rd clustering variable. (We should probably fix this.)

      2. You are doing fixed effects estimation. The standard heteroskedastic-robust covariance estimator is not consistent when you have a large number of FEs (i.e., in the asymptotics you are sending the number of FEs to infinity). This is why xtreg with robust actually reports cluster-robust and not standard robust SEs. The reference for this is Stock-Watson in Econometrica 2008. The cluster-robust covariance estimator is still consistent in this setting, though. So you should probably use cluster-robust of some flavour and not standard het-robust.

      3. Whether or not you should use 2-way cluster-robust depends partly on your setting. One pitfall is that you might have only a small number of clusters in one of the dimensions (year?). The usual 2-way cluster-robust justification relies on asymptotics that sends the numbers of clusters to infinity in both dimensions. How may countries and years of data do you have? If you want to do 2-way clustering using countries and years, you'll want a decent number of both.

      Comment


      • #4
        I have 34 countries, 5 years, and 4 industries, so I do not have an infinite number of clusters in my dimensions.
        What happens if I use one-way clustering? Do I need an infinite number of clusters like in the case of 2-wa clustering?

        What I still don’t really understand is when to use clusters. If I want, I can always find an argument to use cluster, right? For example, if I cluster country in my case, I can argue that there is an external shock that affects only certain countries. Or I cluster industry and argue that there might be an increase of oil prices and some industries are more affected than others etc.

        And with reference to point 2: what do you mean by “use cluster-robust of some flavour and not standard het-robust”?

        Comment


        • #5
          These covariance estimators have an asymptotic justification, so basically you want to be able to say "N is far enough on the way to infinity" for the justification to be plausible. 34 is getting there (there's some discussion of this in the Angrist and Pischke "Mostly Harmless Econometrics" book) but 5 and 4 ... not really. So one-way clustering looks like the option to consider.

          There was a related discussion on Statalist just a few days ago with some references that you might want to look at:

          http://www.statalist.org/forums/foru...in-the-country

          The idea behind clustering is that it's a way of dealing with the failure of the assumption of independence of observations that makes the classical and standard het-robust covariance estimators fail. They fail for the same reason that when you use OLS (say) but you have serial correlation, the classical and het-robust SEs will probably be wrong. Similar intuition behind cluster-robustness - in an NT panel data setting, if you cluster on the N panels, you allow for arbitrary serial correlation within panels, so you are relaxing the independence assumption. But you still need independence across panels for the cluster-robust covariance estimator.

          But this isn't the only way to deal with serial correlation. Instead of allowing for arbitrary within-panel serial correlation, you can model the shocks or whatever that could be generating the serial correlation (e.g., using a dummy for the "external shock that affects only certain countries").

          Comment

          Working...
          X