Hello,
I am using the HECS 2022–23 survey on household consumption in India, and I intend to conduct analysis at the household level. However, I'm uncertain about how to properly specify the survey design using Stata’s svyset command.
According to the official documentation:
The HCES 2022–23 employed a multi-stage stratified sampling design. Villages/urban blocks (or their sub-units) were treated as the First Stage Units (FSUs), while households were the Ultimate Stage Units (USUs). Both FSUs and USUs were selected using Simple Random Sampling Without Replacement (SRSWOR). Within each FSU, 18 sample households were selected.
Furthermore:
To ensure adequate representation of households from different economic categories, all households in a selected FSU were grouped into three strata based on (i) land possessed (in rural areas) and (ii) ownership of a car (in urban areas) as of the survey date. From these groups, 18 households were selected with proportional representation. Detailed methodology is provided in Appendix B of the report.
Based on this, I have tried specifying the survey design in Stata using:
svyset fsu_serial_no [pweight=weights], strata(stratum)
However, I'm not entirely confident whether this is correct.
I also have the following questions:
1. Is it always necessary to use `svyset` to reflect the survey design, or can I instead just use weights via [iw=weights] in each command, since I understand these are household-level weights?
2. For example, assuming that only 40% of bottom-quintile households should be receiving benefits from a government health insurance scheme (`is_hhmem_pmjay`, where 1 = beneficiary and 0 = non-beneficiary), I attempted to calculate the distribution of beneficiaries across MPCE quintiles using the following command:
tab mpce_quintile if is_hhmem_pmjay == 1 [iw=weights]
This produced:
5 quantiles |
of mpce | Freq. Percent Cum.
------------+-----------------------------------
1 | 19,720,093.9 27.51 27.51
2 | 16,968,310.1 23.67 51.19
3 | 14,707,596.8 20.52 71.71
4 | 12,273,319.2 17.12 88.83
5 | 8,007,204.0 11.17 100.00
------------+-----------------------------------
Total | 71,676,524.2 100.00
Is this an appropriate use of the weights?
Thank you in advance for your help.
I am using the HECS 2022–23 survey on household consumption in India, and I intend to conduct analysis at the household level. However, I'm uncertain about how to properly specify the survey design using Stata’s svyset command.
According to the official documentation:
The HCES 2022–23 employed a multi-stage stratified sampling design. Villages/urban blocks (or their sub-units) were treated as the First Stage Units (FSUs), while households were the Ultimate Stage Units (USUs). Both FSUs and USUs were selected using Simple Random Sampling Without Replacement (SRSWOR). Within each FSU, 18 sample households were selected.
Furthermore:
To ensure adequate representation of households from different economic categories, all households in a selected FSU were grouped into three strata based on (i) land possessed (in rural areas) and (ii) ownership of a car (in urban areas) as of the survey date. From these groups, 18 households were selected with proportional representation. Detailed methodology is provided in Appendix B of the report.
Based on this, I have tried specifying the survey design in Stata using:
svyset fsu_serial_no [pweight=weights], strata(stratum)
However, I'm not entirely confident whether this is correct.
I also have the following questions:
1. Is it always necessary to use `svyset` to reflect the survey design, or can I instead just use weights via [iw=weights] in each command, since I understand these are household-level weights?
2. For example, assuming that only 40% of bottom-quintile households should be receiving benefits from a government health insurance scheme (`is_hhmem_pmjay`, where 1 = beneficiary and 0 = non-beneficiary), I attempted to calculate the distribution of beneficiaries across MPCE quintiles using the following command:
tab mpce_quintile if is_hhmem_pmjay == 1 [iw=weights]
This produced:
5 quantiles |
of mpce | Freq. Percent Cum.
------------+-----------------------------------
1 | 19,720,093.9 27.51 27.51
2 | 16,968,310.1 23.67 51.19
3 | 14,707,596.8 20.52 71.71
4 | 12,273,319.2 17.12 88.83
5 | 8,007,204.0 11.17 100.00
------------+-----------------------------------
Total | 71,676,524.2 100.00
Is this an appropriate use of the weights?
Thank you in advance for your help.
Comment