Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why Clustering can account for multiple appearance of observation?


    Hi all,

    I am new to the forum. Recently I am reading some paper, and I am a little bit confused by the sentence that wrote in the paper. Hope that I could get some help, and thanks in advance!

    In the paper wrote by Deshpande & Li. (2019), they regard "closing zips" as control groups, and these control groups appears multiple times in the dataset. In page 224, they wrote:

    ".... Note that our strategy of using future closings as controls for current closings will result in the same zip code appearing multiple times in the data. Clustering at the closing level accounts for the repeated appearance of zip codes since zip codes are fully nested within closings."

    Also, in the paper wrote by Fetter & Lockwood(2018), they conduct analysis for counties that border the state, and since some counties border two or more different states, they will appear in the data as many times as there are states that it borders. In page 2187, they also wrote:

    ".... Since our policy of interest varies at the state level, we cluster standard errors at the state level. This level of clustering also accounts for the duplication of observations in counties lying on multiple state boundaries."


    Both of these two papers mentioned that if observations appear multiple times in the dataset, then if we cluster our analysis at this level, then we can account for the duplication problem. Could I bother to ask why it is the case? Thanks!


    Reference:
    Deshpande, M., & Li, Y. (2019). Who is screened out? Application costs and the targeting of disability programs. American Economic Journal: Economic Policy, 11(4), 213-48.
    Fetter, D. K., & Lockwood, L. M. (2018). Government old-age support and labor supply: Evidence from the old age assistance program. American Economic Review, 108(8), 2174-2211.







  • #2
    Liu:
    welcome to this forum.
    The reason underlying that methidological approach rests on the reasonable suspect that these observation share something unobserved (that is, some if ther features are not totally independent).
    Threfore, their standard errors shoud account for that via -vce(cluster clusterid)-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks for your clarification, Carlo. However, I am still a little bit confused. For example, we have the following 4 observations, in which A1 and A2 belongs to group1, and A1 and A3 belongs to group 2. A1 appears twice and serve as the control group. When we cluster at group level, we assume that (A1 and A2) or (A1 and A3) share some unobserved characteristics, but observations in group 1 and observations in group2 are independent. However, since A1 is duplicate, it is obvious that observations in group 1 and observations in group2 are not independent, then why clustering at group level can solve the duplication problem as stated in these two papers? I am not sure whether my understanding is correct, and hope that I could get further help, many thanks!

      Group ID
      1 A1
      1 A2
      2 A1
      2 A3

      Comment


      • #4
        Liu:
        footnote #13, page 224 https://pubs.aeaweb.org/doi/pdfplus/...7/pol.20180076 points the reader out to a pivotal contribution to this topic (http://jhr.uwpress.org/content/50/2/317.refs, which is also available as a working paper at http://cameron.econ.ucdavis.edu/rese...5_February.pdf).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks so much for your help, Carlo!

          Comment

          Working...
          X