Dear all,
I would like to declare my survey design to Stata (version 13) using svyset.
I have a sample of 495 firms that was collected in 3 industries across 3 countries. These firms were picked based on their Size and Exporting status.
The Table below shows an overview of the sampling design for 1 country*industry combination. It also shows how I calculated sampling weights and the finite population correction, following the formula ((N-n)/(N-1))^1/2.
I constructed weights to make my sample more representative of the universe of firms per country-sector. That’s also how I calculated the finite population correction (fpc): per country-sector.
I want to tell Stata that my data is structured as such. After reading about svyset, I believe that my Primary Sampling Unit (PSU), or my cluster, is country*sector, while I believe my strata to be Size*Export. Furthermore, from reading this page, I believe my data is best classified as a one-stage clustered design with stratification.
In Stata, I write:
But I do not think this is in line with my sampling design. I believe #units refers to the number of PSUs, or clusters, of which I only have 9, not 44. However, svyset seems to want to assign each cluster to a unique stratum (i.e. each country*Sector to a group of, say Small exporting firms), but that’s not how my data is organized, since I selected per cluster/psu a number of firms based on their size and exporting status. In addition, if I type
Stata tells me "fpc for all observations within a stratum must be the same (r.461)". However, in my case the fpc of course depends not only on the stratum but also on the cluster (country*sector ID) that we are looking at.
What am I doing wrong? Is my understanding of PSU / cluster and strata wrong? Should I add a second stage to svyset, should I set my strata and clusters differently (e.g. should I identify country and sector as strata as well?), or should I change my sampling design altogether, e.g. to a mere cluster sample or to a stratified random sampling? Currently, I am indeed leaning towards identifying country and sector as strata as well, and having no psu, but that does not seem to be in line with my conceptual understanding of PSU.
Any advice would be much appreciated, of course.
Many thanks in advance,
Loe
I would like to declare my survey design to Stata (version 13) using svyset.
I have a sample of 495 firms that was collected in 3 industries across 3 countries. These firms were picked based on their Size and Exporting status.
The Table below shows an overview of the sampling design for 1 country*industry combination. It also shows how I calculated sampling weights and the finite population correction, following the formula ((N-n)/(N-1))^1/2.
|
I constructed weights to make my sample more representative of the universe of firms per country-sector. That’s also how I calculated the finite population correction (fpc): per country-sector.
I want to tell Stata that my data is structured as such. After reading about svyset, I believe that my Primary Sampling Unit (PSU), or my cluster, is country*sector, while I believe my strata to be Size*Export. Furthermore, from reading this page, I believe my data is best classified as a one-stage clustered design with stratification.
In Stata, I write:
Code:
Egen cluster = group(country sector) Egen strata = group(size expdummy) svyset cluster [pweight=surveyweight], vce(linearized) strata(strata) singleunit(missing) fpc(fpc) Svydes
Stratum | #Units | #Obs |
-------- | -------- | |
1 | 9 | 258 |
2 | 9 | 82 |
3 | 9 | 43 |
4 | 7 | 46 |
5 | 5 | 25 |
6 | 5 | 41 |
--- | --- | --- |
6 | 44 | 495 |
But I do not think this is in line with my sampling design. I believe #units refers to the number of PSUs, or clusters, of which I only have 9, not 44. However, svyset seems to want to assign each cluster to a unique stratum (i.e. each country*Sector to a group of, say Small exporting firms), but that’s not how my data is organized, since I selected per cluster/psu a number of firms based on their size and exporting status. In addition, if I type
Code:
svy: mean sales
What am I doing wrong? Is my understanding of PSU / cluster and strata wrong? Should I add a second stage to svyset, should I set my strata and clusters differently (e.g. should I identify country and sector as strata as well?), or should I change my sampling design altogether, e.g. to a mere cluster sample or to a stratified random sampling? Currently, I am indeed leaning towards identifying country and sector as strata as well, and having no psu, but that does not seem to be in line with my conceptual understanding of PSU.
Any advice would be much appreciated, of course.
Many thanks in advance,
Loe
Comment