Hi everyone,
This is more of a statistical approach question rather than a Stata coding question.
I am using a dataset that was collected by randomly selecting communities (primary sampling units) from the total communities defined in the national census of a certain country. After this, households (secondary sampling unit) were randomly selected from these selected communities. All household heads were then surveyed to capture household-level variables and variables specific to the household head (for example age and gender), while consenting individuals from the household were further surveyed for more individual-level variables.
I am trying to estimate how the household-level variable of income predicts the household-level variable of health safety measures, controlling for other household and individual variables and by using district-level (the largest geographical aggregation in the data) shocks to income as instruments. My question is, at what level do I need to cluster my standard errors? Would clustering at the primary sampling unit (community) suffice or do I need to do it at the household or district levels?
Thank you!
This is more of a statistical approach question rather than a Stata coding question.
I am using a dataset that was collected by randomly selecting communities (primary sampling units) from the total communities defined in the national census of a certain country. After this, households (secondary sampling unit) were randomly selected from these selected communities. All household heads were then surveyed to capture household-level variables and variables specific to the household head (for example age and gender), while consenting individuals from the household were further surveyed for more individual-level variables.
I am trying to estimate how the household-level variable of income predicts the household-level variable of health safety measures, controlling for other household and individual variables and by using district-level (the largest geographical aggregation in the data) shocks to income as instruments. My question is, at what level do I need to cluster my standard errors? Would clustering at the primary sampling unit (community) suffice or do I need to do it at the household or district levels?
Thank you!

Comment