Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I cluster? at what level? when treatment is assigned using both time and space

    Similar to other posts, my question is regarding whether to cluster and at what level. And again, the question comes up after reading Abadie et al paper.

    In my exercise, I have individual level data from two waves of a survey (2 cross-sections). Let's forget for a moment about the sampling reasons for clustering and focus on the assignment reasons.

    My treatment (W) is allocated across individuals based on their province and month/year of birth. Hence an individual born in province X in September of 1962 might have W=1 while another individual born in province X in October of 1950 might have W=0. I have information on the assignment for five birth years (e.g. 1960-1965), hence I only keep from the survey those individuals that were born within those five years and drop the rest.

    Now, my first impulse was to cluster s.e. by province of birth. But the assignment is actually carried at the province*year*month combination. Hence I'm more inclined now to create a variable as in

    egen cluster=group(birth_prov birth_month birth_year)

    and use that variable for clustering. However, when I do this, I'm left with 1 observation for most province*year*month combinations, at which point then I don't see the point of clustering anymore.

    Now my questions are:

    1- I would like to know what you think about this?
    2- Would it make a difference if I would still have many observations for each province*year*month combination?

  • #2
    Welcome to Statalist. You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - code in code delimiters, Stata results and data using dataex. Knowing the exact format of your data is essential to helping you. We want your code partially to see that you've made an effort to solve your problem before asking Statalist.

    You refer to exercise. We don't help with homework.

    The question of how to cluster is more a matter of substance than stats and you have to be the substance expert. If you have only one observation per province,year, month, then you can't have panels at that level. This sounds like panel data, but you have not mentioned what models you plan to estimate.

    Comment


    • #3
      Thanks for your reply Phil Bromiley !
      You are correct, my question is more theoretical and that's why I didn't include any Stata material. Following the Abadie et al paper, my question is whether if treatment is randomized across locations and dates of birth, whether we should consider clustering at these combination (location * dob).

      Will follow your advice for next time.
      Thanks again!

      Comment


      • #4
        In general, you want to cluster at a fairly high level--at whatever level you think there could be (substantial) correlation in errors across observations. In this case, I would cluster at the province level (so accounting for correlation in assignment and outcomes across provinces--for example, if the whole province enters a recession, we would expect outcomes and unobservables of people in that province to move together). Clustering at province * year * month is probably too fine a level to be helpful, as your small cluster size indicates.

        The Abadie paper isn't very relevant here, because your assignment to treatment is done at the province level. The Abadie paper is cautioning against clustering in the case where you do NOT have clustering in the assignment to treatment--but in this case, you do have clustered assignment, so you should cluster standard errors.

        Comment

        Working...
        X