Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustered st.errors

    Dear Statalist,

    I am running a simple regression model and I have a question on when to use clustered errors, how to test for the need of clustering and what is the difference between clustering errors and using interaction terms.

    Model / setup:
    I have a regression model estimating how various factors influence a shipping price of specific good using maritime transport. My data contain information on contracts signed by a large customer for shipping services (e.g. a large producer of shirts buying shipping services from China to Europe/US/AUS from various shipping companies). Each observation contains information on what shipper is responsible from that shipment, on what route is the shipment and price of shipping. Good are always the same and costs of transporting them is constant across time and contracts. The objective of my exercise is to analyse an impact of an industry shock, which occurred at one point in time and continued till today (e.g. merger between two large competitors leading to increased prices). This is modelled with a dummy variable (DVbreak) taking “1” for time period after the industry shock and “0” before.
    In its simplified form, my model looks like this

    Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	12.9 KB
ID:	1364296


    where price is being explained by demand and cost drivers, fixed effects and dummy for industry shock.

    Question:
    My expectation is that error terms will have different variance for each carrier, maybe even route. I tested for heteroscedasticity using Breusch-Pagan test, which confirmed heteroscedasticity. I also suspect that error terms might be correlated for the same carrier and/or route. In that case clustering standard errors would be necessary.
    1. How can I test for correlation of standard errors within a specific group? Is there a rule of thumb from what level of correlation clustering is appropriate/necessary?
    2. What is the difference between clustering for standard errors and including an interaction term in the model? E.g. if errors were clustered by carrier, I could include interaction term “DVbreak * FEcarrier”. Would that lead to the same results as clustering by carrier?
    Thank you for your help

  • #2
    You might be interested in Cameron & Miller's guide on cluster-robust inference, available from the first author's webpage and JHR, google "A Practitioner’s Guide to Cluster-Robust Inference".

    Comment


    • #3
      Hi Jesse,

      thats a very useful reading. Thank you. It also led me here: http://ageconsearch.umn.edu/bitstrea...art_st0039.pdf (testing for serial correlation in panel data). That practicaly covers my Q1 (but any other input on that is appreciated).

      Could anyone advice on Q2?
      Thanks

      Comment


      • #4
        Hi,

        ok, I'm made some progress and found a really interesting article. In summary, I am trying to estimate an effect across groups of different size and I would like to obtain a population average weighted by the group size. This can be done by weighted regression (WLS) instead of OLS.

        However, according to the article (see link below), weighting is not appropriate if error terms are clustered within groups. In that case the estimator is inconsistent and produces biased st. errors.
        See the article summary here (my case is the “Identifying average partial effects”): http://blogs.worldbank.org/impacteva...sample-weights

        My question is:
        If I use clustered errors in Stata, will this help me overcome the issue of inconsistent estimator to get the correct st. errors? The article mentions the option of including interaction terms, but I don’t have enough variation in the data in each group to estimate an interaction term (only average effect across all groups).

        Thanks

        Comment


        • #5
          If you believe your linear model is correctly specified (i.e. there's no omitted variables and so on), then clustered standard errors will produce a correct standard errors as long your clusters are not correlated with each other (the error terms, that is). They can be correlated within the cluster and serially correlated.

          Comment

          Working...
          X