Clustered st.errors

Tomas Houska

Join Date: Oct 2016

Posts: 11
#1

Clustered st.errors

15 Nov 2016, 04:16

Dear Statalist,

I am running a simple regression model and I have a question on when to use clustered errors, how to test for the need of clustering and what is the difference between clustering errors and using interaction terms.

Model / setup:
I have a regression model estimating how various factors influence a shipping price of specific good using maritime transport. My data contain information on contracts signed by a large customer for shipping services (e.g. a large producer of shirts buying shipping services from China to Europe/US/AUS from various shipping companies). Each observation contains information on what shipper is responsible from that shipment, on what route is the shipment and price of shipping. Good are always the same and costs of transporting them is constant across time and contracts. The objective of my exercise is to analyse an impact of an industry shock, which occurred at one point in time and continued till today (e.g. merger between two large competitors leading to increased prices). This is modelled with a dummy variable (DVbreak) taking “1” for time period after the industry shock and “0” before.
In its simplified form, my model looks like this

where price is being explained by demand and cost drivers, fixed effects and dummy for industry shock.

Question:
My expectation is that error terms will have different variance for each carrier, maybe even route. I tested for heteroscedasticity using Breusch-Pagan test, which confirmed heteroscedasticity. I also suspect that error terms might be correlated for the same carrier and/or route. In that case clustering standard errors would be necessary.
How can I test for correlation of standard errors within a specific group? Is there a rule of thumb from what level of correlation clustering is appropriate/necessary?

What is the difference between clustering for standard errors and including an interaction term in the model? E.g. if errors were clustered by carrier, I could include interaction term “DVbreak * FEcarrier”. Would that lead to the same results as clustering by carrier?

Thank you for your help
Tags: None
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#2

15 Nov 2016, 04:28

You might be interested in Cameron & Miller's guide on cluster-robust inference, available from the first author's webpage and JHR, google "A Practitioner’s Guide to Cluster-Robust Inference".
Comment
Tomas Houska

Join Date: Oct 2016

Posts: 11
#3

15 Nov 2016, 08:48

Hi Jesse,

thats a very useful reading. Thank you. It also led me here: http://ageconsearch.umn.edu/bitstrea...art_st0039.pdf (testing for serial correlation in panel data). That practicaly covers my Q1 (but any other input on that is appreciated).

Could anyone advice on Q2?
Thanks
Comment
Tomas Houska

Join Date: Oct 2016

Posts: 11
#4

16 Nov 2016, 08:41

Hi,

ok, I'm made some progress and found a really interesting article. In summary, I am trying to estimate an effect across groups of different size and I would like to obtain a population average weighted by the group size. This can be done by weighted regression (WLS) instead of OLS.

However, according to the article (see link below), weighting is not appropriate if error terms are clustered within groups. In that case the estimator is inconsistent and produces biased st. errors.
See the article summary here (my case is the “Identifying average partial effects”): http://blogs.worldbank.org/impacteva...sample-weights

My question is:
If I use clustered errors in Stata, will this help me overcome the issue of inconsistent estimator to get the correct st. errors? The article mentions the option of including interaction terms, but I don’t have enough variation in the data in each group to estimate an interaction term (only average effect across all groups).

Thanks
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#5

16 Nov 2016, 10:19

If you believe your linear model is correctly specified (i.e. there's no omitted variables and so on), then clustered standard errors will produce a correct standard errors as long your clusters are not correlated with each other (the error terms, that is). They can be correlated within the cluster and serially correlated.
Comment

Announcement

Clustered st.errors

Comment

Comment

Comment

Comment