Hi,
I'm working with the American Community Survey (ACS), a dataset with 3mio + observations. I often have to tabulate two variables I've created, namely naics2017 and skill (both numeric floats). I have a fine-grained version of naics2017, with over 400 unique values, and a broader version, with only 20 unique values. My skill variable always only takes four values. All values are integers, but can be negative.
When using my 20-value naics2017 variable, I can easily use my default ACS svyset and svy commands:
It takes my computer a full two minutes to process, but that's fine. When I try to do the same thing with my 400+ value naics2017 variable, the calculation never ends and stata eventually just freezes.
I think I've found a decent work-around that enables me to use the same code for both versions of my naics2017 variable. The following code (without the svyset command) works without putting my computer into a coma:
But the thing is: I have never used the iweight option before, and I don't quite understand what calculation is behind it if I use it with two-way tabulation in this way. The help command for iweight says that "any command that supports iweights will define exactly how they are treated." However, under help for 'tabulate twoway' I can't find anything about iweights.
Can anybody tell me where to look to learn how tabulate twoway uses iweights? Or can somebody tell me if I am using the iweight option correctly? I am optimistic because using these two commands with my 20-value naics2017 variable gets me the same results as my standard pweight option with the svyset command. But I want to understand the iweight option for the tabulate command, and would be grateful if somebody could point me toward a useful explanation.
The ACS person weight variable btw is perwt:
I'm working with the American Community Survey (ACS), a dataset with 3mio + observations. I often have to tabulate two variables I've created, namely naics2017 and skill (both numeric floats). I have a fine-grained version of naics2017, with over 400 unique values, and a broader version, with only 20 unique values. My skill variable always only takes four values. All values are integers, but can be negative.
When using my 20-value naics2017 variable, I can easily use my default ACS svyset and svy commands:
Code:
svyset cluster [pweight = perwt], strata(strata) svy: tab naics2017 skill, row
I think I've found a decent work-around that enables me to use the same code for both versions of my naics2017 variable. The following code (without the svyset command) works without putting my computer into a coma:
Code:
tab naics2017 skill [iweight = perwt], row
Can anybody tell me where to look to learn how tabulate twoway uses iweights? Or can somebody tell me if I am using the iweight option correctly? I am optimistic because using these two commands with my 20-value naics2017 variable gets me the same results as my standard pweight option with the svyset command. But I want to understand the iweight option for the tabulate command, and would be grateful if somebody could point me toward a useful explanation.
The ACS person weight variable btw is perwt:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int perwt 161 128 71 148 159 53 131 85 312 77 96 40 48 142 7 250 59 73 89 58 174 74 86 69 133 12 66 31 36 31 227 111 125 226 46 159 74 59 97 121 55 198 17 193 22 125 63 147 94 54 110 49 105 122 45 233 189 103 44 50 173 118 44 16 174 79 60 18 102 92 124 195 116 53 62 33 60 111 113 39 161 38 67 172 133 236 34 101 137 270 31 74 65 26 199 30 85 61 163 107 end
Comment