Similar to other posts, my question is regarding whether to cluster and at what level. And again, the question comes up after reading Abadie et al paper.
In my exercise, I have individual level data from two waves of a survey (2 cross-sections). Let's forget for a moment about the sampling reasons for clustering and focus on the assignment reasons.
My treatment (W) is allocated across individuals based on their province and month/year of birth. Hence an individual born in province X in September of 1962 might have W=1 while another individual born in province X in October of 1950 might have W=0. I have information on the assignment for five birth years (e.g. 1960-1965), hence I only keep from the survey those individuals that were born within those five years and drop the rest.
Now, my first impulse was to cluster s.e. by province of birth. But the assignment is actually carried at the province*year*month combination. Hence I'm more inclined now to create a variable as in
egen cluster=group(birth_prov birth_month birth_year)
and use that variable for clustering. However, when I do this, I'm left with 1 observation for most province*year*month combinations, at which point then I don't see the point of clustering anymore.
Now my questions are:
1- I would like to know what you think about this?
2- Would it make a difference if I would still have many observations for each province*year*month combination?
In my exercise, I have individual level data from two waves of a survey (2 cross-sections). Let's forget for a moment about the sampling reasons for clustering and focus on the assignment reasons.
My treatment (W) is allocated across individuals based on their province and month/year of birth. Hence an individual born in province X in September of 1962 might have W=1 while another individual born in province X in October of 1950 might have W=0. I have information on the assignment for five birth years (e.g. 1960-1965), hence I only keep from the survey those individuals that were born within those five years and drop the rest.
Now, my first impulse was to cluster s.e. by province of birth. But the assignment is actually carried at the province*year*month combination. Hence I'm more inclined now to create a variable as in
egen cluster=group(birth_prov birth_month birth_year)
and use that variable for clustering. However, when I do this, I'm left with 1 observation for most province*year*month combinations, at which point then I don't see the point of clustering anymore.
Now my questions are:
1- I would like to know what you think about this?
2- Would it make a difference if I would still have many observations for each province*year*month combination?
Comment